You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Sahlberg <da...@gmail.com> on 2021/08/09 01:12:03 UTC

[PATCH] Issue #4711

Hi,

I've been researching issue #4711 [1] and, as requested by danielsh in 
JIRA, I'm turning to dev@ to discuss a possible solution.

As already found by danielsh, the problem lies in the filtering code in 
svn_cl__log_entry_receiver_xml. If the log entry doesn't match the 
search pattern, the code will return without emitting any opening tag. 
However when the library calls back to close a tag, it is done without 
checking if the opening tag was emitted or not.

Without filtering, the tree is recursive. I have made a test case where 
I have three branches, a, b and c. I've made commits to a, which I have 
subsequently merged to b and then from b to c. The XML is as follows:

[[[

     <logentry revision="12">
         <author>dsahlberg</author>
         <date>2021-08-08T19:29:38.411358Z</date>
         <msg>merge b to c</msg>
         <logentry reverse-merge="false" revision="11">
             <author>dsahlberg</author>
             <date>2021-08-08T19:29:26.563564Z</date>
             <msg>merge a to b</msg>
             <logentry revision="10" reverse-merge="false">
                 <author>dsahlberg</author>
<date>2021-08-08T19:29:10.860725Z</date>
                 <msg>add 1 file</msg>
             </logentry>
         </logentry>
     </logentry>
]]]

I'm proposing to keep the XML tree intact, but filter out the 
information that doesn't match the search pattern. If searching for "add 
1 file", the result would be as follows:

[[[

     <logentry revision="12">
         <logentry revision="11">
             <logentry revision="10" reverse-merge="false">
                 <author>dsahlberg</author>
<date>2021-08-08T19:29:10.860725Z</date>
                 <msg>add 1 file</msg>
             </logentry>
         </logentry>
     </logentry>
]]]

The most simple solution is to add the opening tag (<logentry 
revision="n">) whenever a log entry has children:
[[[

Index: subversion/svn/log-cmd.c
===================================================================
--- subversion/svn/log-cmd.c    (revision 1892053)
+++ subversion/svn/log-cmd.c    (working copy)
@@ -552,6 +552,7 @@
        return SVN_NO_ERROR;
      }

+  revstr = apr_psprintf(pool, "%ld", log_entry->revision);
    /* Match search pattern before XML-escaping. */
    if (lb->search_patterns &&
        ! match_search_patterns(lb->search_patterns, author, date, message,
@@ -559,6 +560,9 @@
      {
        if (log_entry->has_children)
          {
+          svn_xml_make_open_tag(&sb, pool, svn_xml_normal, "logentry",
+                              "revision", revstr, SVN_VA_NULL);
+          SVN_ERR(svn_cl__error_checked_fputs(sb->data, stdout));
            if (! lb->merge_stack)
              lb->merge_stack = apr_array_make(lb->pool, 1, 
sizeof(svn_revnum_t));

@@ -575,7 +579,6 @@
    if (message)
      message = svn_xml_fuzzy_escape(message, pool);

-  revstr = apr_psprintf(pool, "%ld", log_entry->revision);
    /* <logentry revision="xxx"> */
    if (lb->merge_stack && lb->merge_stack->nelts > 0)
      {

]]]

The problem with this solution is that it might emit empty XML elements, 
like this:

[[[

     <logentry revision="6">
     </logentry>
]]]

Depending on how the XML parser is built and how the information is 
used/displayed this may or may not be a problem. As far as I understand, 
most (if not all) tags are optional so it should not be an issue that 
there are no children (for example <msg></msg>). But it may be that an 
empty revision 6 is displayed in this case.

A more complex solution would require keeping track of the XML tags that 
should be emitted if a child matches the search pattern. There is 
already an array (lb->merge_stack) that could probably be used for this 
purpose. Instead of just storing the revision number it would have to 
store if the opening tag has been emitted or not. When a child matches 
the search pattern, all parent nodes that has not previously been 
emitted will be emitted. When the library signals to close a tag, the 
code checks if the corresponding open tag has been emitted or not.

The following is a concept for such a patch (I know HACKING has some 
things to say about how to name typedefs etc, but I don't have time to 
study it now, there are also whitespace issues). If this is a more 
desirable solution (ie, I get some +1 (concept)) then I will cleanup the 
patch and repost.

[[[

Index: subversion/svn/log-cmd.c
===================================================================
--- subversion/svn/log-cmd.c    (revision 1892053)
+++ subversion/svn/log-cmd.c    (working copy)
@@ -45,6 +45,11 @@

  #include "svn_private_config.h"

+typedef struct merge_stack {
+  svn_revnum_t rev;
+  svn_boolean_t emitted;
+} merge_stack_t;
+
  
  /*** Code. ***/

@@ -355,10 +360,14 @@
      {
        if (log_entry->has_children)
          {
+      merge_stack_t ms;
+
            if (! lb->merge_stack)
-            lb->merge_stack = apr_array_make(lb->pool, 1, 
sizeof(svn_revnum_t));
+            lb->merge_stack = apr_array_make(lb->pool, 1, 
sizeof(merge_stack_t));

-          APR_ARRAY_PUSH(lb->merge_stack, svn_revnum_t) = 
log_entry->revision;
+      ms.rev = log_entry->revision;
+      ms.emitted = FALSE;
+          APR_ARRAY_PUSH(lb->merge_stack, merge_stack_t) = ms;
          }

        return SVN_NO_ERROR;
@@ -435,10 +444,10 @@
        iterpool = svn_pool_create(pool);
        for (i = 0; i < lb->merge_stack->nelts; i++)
          {
-          svn_revnum_t rev = APR_ARRAY_IDX(lb->merge_stack, i, 
svn_revnum_t);
+          merge_stack_t ms = APR_ARRAY_IDX(lb->merge_stack, i, 
merge_stack_t);

            svn_pool_clear(iterpool);
-          SVN_ERR(svn_cmdline_printf(iterpool, " r%ld%c", rev,
+          SVN_ERR(svn_cmdline_printf(iterpool, " r%ld%c", ms.rev,
                                       i == lb->merge_stack->nelts - 1 ?
'\n' : ','));
          }
@@ -475,10 +484,14 @@

    if (log_entry->has_children)
      {
+      merge_stack_t ms;
+
        if (! lb->merge_stack)
-        lb->merge_stack = apr_array_make(lb->pool, 1, 
sizeof(svn_revnum_t));
+        lb->merge_stack = apr_array_make(lb->pool, 1, 
sizeof(merge_stack_t));

-      APR_ARRAY_PUSH(lb->merge_stack, svn_revnum_t) = log_entry->revision;
+      ms.rev = log_entry->revision;
+      ms.emitted = FALSE;
+      APR_ARRAY_PUSH(lb->merge_stack, merge_stack_t) = ms;
      }

    return SVN_NO_ERROR;
@@ -533,6 +546,7 @@
    const char *author;
    const char *date;
    const char *message;
+  int i;

    if (lb->ctx->cancel_func)
SVN_ERR(lb->ctx->cancel_func(lb->ctx->cancel_baton));
@@ -544,11 +558,14 @@

    if (! SVN_IS_VALID_REVNUM(log_entry->revision))
      {
-      svn_xml_make_close_tag(&sb, pool, "logentry");
-      SVN_ERR(svn_cl__error_checked_fputs(sb->data, stdout));
-      if (lb->merge_stack)
-        apr_array_pop(lb->merge_stack);
-
+      if (lb->merge_stack) {
+    if (lb->merge_stack->nelts > 0 && (APR_ARRAY_IDX(lb->merge_stack, 
lb->merge_stack->nelts-1, merge_stack_t).emitted))
+      {
+        svn_xml_make_close_tag(&sb, pool, "logentry");
+        SVN_ERR(svn_cl__error_checked_fputs(sb->data, stdout));
+      }
+    apr_array_pop(lb->merge_stack);
+      }
        return SVN_NO_ERROR;
      }

@@ -559,15 +576,32 @@
      {
        if (log_entry->has_children)
          {
+      merge_stack_t ms;
            if (! lb->merge_stack)
-            lb->merge_stack = apr_array_make(lb->pool, 1, 
sizeof(svn_revnum_t));
+            lb->merge_stack = apr_array_make(lb->pool, 1, 
sizeof(merge_stack_t));

-          APR_ARRAY_PUSH(lb->merge_stack, svn_revnum_t) = 
log_entry->revision;
+      ms.rev = log_entry->revision;
+      ms.emitted = FALSE;
+          APR_ARRAY_PUSH(lb->merge_stack, merge_stack_t) = ms;
          }

        return SVN_NO_ERROR;
      }
-
+  else if (lb->search_patterns && lb->merge_stack) {
+    /* match_search_patterns returned true */
+    for (i = 0; i < lb->merge_stack->nelts; i++)
+      {
+    if (!APR_ARRAY_IDX(lb->merge_stack, i, merge_stack_t).emitted)
+      {
+        revstr = apr_psprintf(pool, "%ld",
+                  APR_ARRAY_IDX(lb->merge_stack, i, merge_stack_t).rev);
+        svn_xml_make_open_tag(&sb, pool, svn_xml_normal, "logentry",
+                  "revision", revstr, SVN_VA_NULL);
+        APR_ARRAY_IDX(lb->merge_stack, i, merge_stack_t).emitted = TRUE;
+      }
+      }
+  }
+
    if (author)
      author = svn_xml_fuzzy_escape(author, pool);
    if (date)
@@ -606,7 +640,6 @@
    if (log_entry->changed_paths2)
      {
        apr_array_header_t *sorted_paths;
-      int i;

        /* <paths> */
        svn_xml_make_open_tag(&sb, pool, svn_xml_normal, "paths",
@@ -684,10 +717,14 @@

    if (log_entry->has_children)
      {
+      merge_stack_t ms;
+
        if (! lb->merge_stack)
-        lb->merge_stack = apr_array_make(lb->pool, 1, 
sizeof(svn_revnum_t));
+        lb->merge_stack = apr_array_make(lb->pool, 1, 
sizeof(merge_stack_t));

-      APR_ARRAY_PUSH(lb->merge_stack, svn_revnum_t) = log_entry->revision;
+      ms.rev = log_entry->revision;
+      ms.emitted = TRUE;
+      APR_ARRAY_PUSH(lb->merge_stack, merge_stack_t) = ms;
      }
    else
      svn_xml_make_close_tag(&sb, pool, "logentry");

]]]

I have run the testsuite with both patch versions without any problems. 
(I have obviously removed XFail() from the test for #4711 in log_tests.py).

Kind regards,

Daniel Sahlberg


[1] https://issues.apache.org/jira/browse/SVN-4711


Re: [PATCH] Issue #4711

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Shahaf wrote on Thu, 19 Aug 2021 12:51 +00:00:
> Daniel Sahlberg wrote on Wed, Aug 18, 2021 at 07:54:47 +0200:
> > The larger question is "do we want it?".

> I think we should include it, because:
> 
> - The API provides it.  The CLI be to the API as the plain log output is
>   to the XML output.
> 
> - It adds information: absence does not imply the value would be "true",
>   nor that it would be "false".

> > In other words: Is it a property of r5 (which was filtered out, so
> > it should be filtered out as well)

It's definitely not a property of r5.  It's quite possible for r5 to be
forward merged into /foo in r6 and reverse merged into /bar in r8.
It's even possible for r5 to be forward merged into /foo and reverse
merged into /bar _in the same revision_.  Thus, the "reverse-merge"
attribute is, a property of the revision in which r5 was (forward or
reverse) merged.

> > or is it a property necessary to reconstruct the path to r4.

Agree: The reverse-merge attribute on r5 is part of explaining how r4 is
related to the target of the log (the path it was run against), so
I think that too argues for keeping that attribute in the output.

Is there an argument in favour of eliding that attribute that I'm
overlooking?  [honest question]


Re: [PATCH] Issue #4711 invalid xml file produced by svn log --xml -g --search

Posted by Daniel Sahlberg <da...@gmail.com>.
Den fre 24 dec. 2021 kl 20:44 skrev Daniel Sahlberg <
daniel.l.sahlberg@gmail.com>:

> Sorry for the long delay but I finally found some time to circle back to
> this.
>
> Den tors 26 aug. 2021 kl 17:21 skrev Daniel Shahaf <d.s@daniel.shahaf.name
> >:
>
>> > I'm leaning towards setting this as a milestone=2.0 issue and for the
>> time
>> > leaving it alone not to risk destabilising anything.
>>
>> That's an option.  Or we could include this change in the next alpha or
>> beta release to get feedback on it.  (We might even be able to use
>> a preprocessor timebomb so the change is automatically reverted between
>> the last beta and rc1 unless we take positive action to keep it;
>> cf. r1889012.)
>>
>
> [removed a long thread regarding showing "reverse merges" in the text
> version output of svn log]
>
> I agree with Daniel Shahaf in principle, but I'm ignoring this question
> right now to focus on fixing the svn log --xml issue. Anyone wanting to
> work on the text output can easily pick up from here. I might do it myself
> in a future fix.
>
> > Have you reviewed the patch from a coding style perspective? I would
>> > appreciate a +1 from someone before committing.
>>
>> I have now.  A few things:
>>
>> - It's useful to describe the issue (e.g., copy its title) rather than
>>   refer to it just by number.  (subject line)
>>
>
> Mail thread. Noted, subject line is now updated.
>
> - Indentation was mangled by emailing, so I didn't review it.
>>
>
> Attaching the fix in a file. I have reviewed (and fixed a bunch of issues)
> it so hope it should be alright.
>
>
>> - Some lines appeared to exceed the 80 columns limit.
>>
>
> Fixed.
>
>
>> - I'd have declared «int i;» twice in the two relevant blocks, rather
>>   than moved it to function scope, so it would have the least possible
>>   scope.  That would convert some bugs (using «i» in a wrong place) into
>>   compile errors (using an undeclared variable).
>>
>
> Ok! (My primary language doesn't allow declarations in blocks so I forgot
> that it's the C way).
>
>
>> + Order of elements of the new struct is good: wider types before
>>   narrower ones.  (Matters for padding bytes.)
>>
>> That's a review for style.  I didn't review correctness.
>
>
> I hope someone can have a look at this. It seems to work for the different
> cases I have thrown at it including the original
> log_with_merge_history_and_search test.
>
> In the patch I've extended log_with_merge_history_and_search with a more
> complex merge scenario (same as I've used up-thread). This works as well,
> but I didn't find a way to verify that the xml structure is nested the way
> it is supposed to do.
>

I would like to commit this fix and unless anyone objects I will do this
during next week.

In addition to testing on my own tree and the test case in #4711, I have
also run the following command on 1.13.0 (r1867053), as supplied in Ubuntu
20.04.3, on 1.14.2-dev (built from a recent trunk) as well as current trunk
with the attached patch:

[[[
svn log --search=revision -g --xml
https://svn.apache.org/repos/asf/subversion/branches/1.10.x
]]]

In 1.13.0 and 1.14.2-dev this produce a lot of mismatched </logentry> end
tags. With this patch the XML file is balanced.

Kind regards,
Daniel

Re: [PATCH] Issue #4711 invalid xml file produced by svn log --xml -g --search

Posted by Daniel Sahlberg <da...@gmail.com>.
Sorry for the long delay but I finally found some time to circle back to
this.

Den tors 26 aug. 2021 kl 17:21 skrev Daniel Shahaf <d....@daniel.shahaf.name>:

> > I'm leaning towards setting this as a milestone=2.0 issue and for the
> time
> > leaving it alone not to risk destabilising anything.
>
> That's an option.  Or we could include this change in the next alpha or
> beta release to get feedback on it.  (We might even be able to use
> a preprocessor timebomb so the change is automatically reverted between
> the last beta and rc1 unless we take positive action to keep it;
> cf. r1889012.)
>

[removed a long thread regarding showing "reverse merges" in the text
version output of svn log]

I agree with Daniel Shahaf in principle, but I'm ignoring this question
right now to focus on fixing the svn log --xml issue. Anyone wanting to
work on the text output can easily pick up from here. I might do it myself
in a future fix.

> Have you reviewed the patch from a coding style perspective? I would
> > appreciate a +1 from someone before committing.
>
> I have now.  A few things:
>
> - It's useful to describe the issue (e.g., copy its title) rather than
>   refer to it just by number.  (subject line)
>

Mail thread. Noted, subject line is now updated.

- Indentation was mangled by emailing, so I didn't review it.
>

Attaching the fix in a file. I have reviewed (and fixed a bunch of issues)
it so hope it should be alright.


> - Some lines appeared to exceed the 80 columns limit.
>

Fixed.


> - I'd have declared «int i;» twice in the two relevant blocks, rather
>   than moved it to function scope, so it would have the least possible
>   scope.  That would convert some bugs (using «i» in a wrong place) into
>   compile errors (using an undeclared variable).
>

Ok! (My primary language doesn't allow declarations in blocks so I forgot
that it's the C way).


> + Order of elements of the new struct is good: wider types before
>   narrower ones.  (Matters for padding bytes.)
>
> That's a review for style.  I didn't review correctness.


I hope someone can have a look at this. It seems to work for the different
cases I have thrown at it including the original
log_with_merge_history_and_search test.

In the patch I've extended log_with_merge_history_and_search with a more
complex merge scenario (same as I've used up-thread). This works as well,
but I didn't find a way to verify that the xml structure is nested the way
it is supposed to do.

Kind regards,
Daniel Sahlberg

Re: [PATCH] Issue #4711

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Thu, Aug 26, 2021 at 14:44:04 +0200:
> Den lör 21 aug. 2021 kl 05:18 skrev Daniel Shahaf <d....@daniel.shahaf.name>:
> 
> > Daniel Sahlberg wrote on Fri, 20 Aug 2021 10:30 +00:00:
> > > Den fre 20 aug. 2021 kl 12:11 skrev Daniel Shahaf <
> > d.s@daniel.shahaf.name>:
> > > > Daniel Sahlberg wrote on Thu, Aug 19, 2021 at 23:23:49 +0200:
> > > > >  [[[
> > > > ⋮
> > > > >
> > ------------------------------------------------------------------------
> > > > > r4 | dsahlberg | 2021-08-19 21:58:28 +0200 (Thu, 19 Aug 2021) | 1
> > line
> > > > > Merged via: r7, r5
> > > > > ]]]
> > > > > As can be seen here, it is not visible in the plain text output that
> > r7 was
> > > > > actually a reverse merge.
> > > >
> > > > Hmm.  There's an interesting question of whether we can amend the
> > output
> > > > to make that clear.  The CLI output is an API, and generally we try not
> > > > to change it unless the user opts in to new features (e.g., when
> > > > changelists were added, the CLI output for users who don't use
> > > > changelists wasn't changed), but there are exceptions (e.g., adding the
> > > > seventh column to `svn status` non-XML output).
> > >
> > > The two options today are:
> > > - Reverse merged via: r7
> > > or:
> > > - Merged via: r7, r5
> > >
> > > The code is looking at the the way r4 was merged and then prints the
> > > revision numbers from the merge stack. With the patch we have enough
> > > information on the merge stack to print "(reverse)" on each revision
> > > that was subtractive, for example:
> > > - Merged via: r7 (reverse), r5
> > >
> > > Another option would be to print both (and each revision only in the
> > > line where it belongs).
> > > - Reverse merged via: r7
> > > - Merged via: r5
> > > To do this one would have to loop through the merge_stack twice (at
> > > least) or keep a reasonably long buffer to store the list of revisions
> > > in case of a long stack.
> > >
> >
> > Hmm.  In favour of the second option is that it doesn't require CLI-
> > parsing tools to learn about new syntax, but only to learn about two
> > existing syntaxes being able to appear concurrently.  With a little
> > luck, some of them will DTRT.
> >
> > Is there anything the first option can represent that the second one
> > can't?  Perhaps some tree-like cases (rN was a merge of both rFOO and
> > rBAR), but those aren't representable in the plain output anyway.
> >
> 
> > For a buffer, we have svn_stringbuf_t that automatically reallocates
> > itself as needed; and in any case, a two-pass algorithm would be easy to
> > implement and to read and performant (it'd still be O(N)).
> >
> > > Both these changes might mess it up for someone consuming the CLI
> > > output and expecting to find one or the other of Merged via: or Reverse
> > > merged via:, or expecting to find just a simple list of revisions. I
> > > can't judge the risk/reward for changing this.
> >
> > Would it be accurate to say that the incumbent output is wrong,
> > misleading, a bug?
> >
> 
> It would at least be misleading, since it isn't possible to figure out if a
> merge some way down the tree was a forward or reverse merge. But that would
> be easy to discover when looking at the code or inspecting the merges more
> closely.

I'm not sure if "that would be easy to discover" is good enough
a reason.  That's an argument could also be used to justify, say,
deleting the «svn diff» command.  Determining reverse merges is
something we can do easily and would be useful, so why not do it?

> If so, we could make the change, as a bugfix.  The release notes could
> > point it out and point to the XML output as a more-stable alternative.
> >
> > If not, we could record the idea as a milestone=2.0 issue.
> >
> 
> I have let this question soak for a few days hoping someone else with more
> experience would chime in.
> 

Personally, I think I'm leaning towards "it's misleading".  I'd probably
prefer this alternative —

> > > Another option would be to print both (and each revision only in the
> > > line where it belongs).
> > > - Reverse merged via: r7
> > > - Merged via: r5

— since it requires the minimal change to parsers: only to be aware that
if a "Reverse merged" line is found, there may be a "Merged via" line
before the next blank line.  I expect few parsers would need to be
adjusted for this, and it would be more friendly to humans.

However, this _would_ be an incompatible change for anyone parsing only
the "Merged via" line, so I can certainly see the logic behind relegating
this to 2.0…

… except that there's nothing preventing us from adding _another_ line
to the output, alongside the existing «Merged via: r7, r5» line, to
indicate the merge directions with.

> I'm leaning towards setting this as a milestone=2.0 issue and for the time
> leaving it alone not to risk destabilising anything.

That's an option.  Or we could include this change in the next alpha or
beta release to get feedback on it.  (We might even be able to use
a preprocessor timebomb so the change is automatically reverted between
the last beta and rc1 unless we take positive action to keep it;
cf. r1889012.)

> Have you reviewed the patch from a coding style perspective? I would
> appreciate a +1 from someone before committing.

I have now.  A few things:

- It's useful to describe the issue (e.g., copy its title) rather than
  refer to it just by number.  (subject line)

- Indentation was mangled by emailing, so I didn't review it.

- Some lines appeared to exceed the 80 columns limit.

- I'd have declared «int i;» twice in the two relevant blocks, rather
  than moved it to function scope, so it would have the least possible
  scope.  That would convert some bugs (using «i» in a wrong place) into
  compile errors (using an undeclared variable).

+ Order of elements of the new struct is good: wider types before
  narrower ones.  (Matters for padding bytes.)

That's a review for style.  I didn't review correctness.

Cheers,

Daniel

Re: [PATCH] Issue #4711

Posted by Daniel Sahlberg <da...@gmail.com>.
Den lör 21 aug. 2021 kl 05:18 skrev Daniel Shahaf <d....@daniel.shahaf.name>:

> Daniel Sahlberg wrote on Fri, 20 Aug 2021 10:30 +00:00:
> > Den fre 20 aug. 2021 kl 12:11 skrev Daniel Shahaf <
> d.s@daniel.shahaf.name>:
> > > Daniel Sahlberg wrote on Thu, Aug 19, 2021 at 23:23:49 +0200:
> > > >  [[[
> > > ⋮
> > > >
> ------------------------------------------------------------------------
> > > > r4 | dsahlberg | 2021-08-19 21:58:28 +0200 (Thu, 19 Aug 2021) | 1
> line
> > > > Merged via: r7, r5
> > > > ]]]
> > > > As can be seen here, it is not visible in the plain text output that
> r7 was
> > > > actually a reverse merge.
> > >
> > > Hmm.  There's an interesting question of whether we can amend the
> output
> > > to make that clear.  The CLI output is an API, and generally we try not
> > > to change it unless the user opts in to new features (e.g., when
> > > changelists were added, the CLI output for users who don't use
> > > changelists wasn't changed), but there are exceptions (e.g., adding the
> > > seventh column to `svn status` non-XML output).
> >
> > The two options today are:
> > - Reverse merged via: r7
> > or:
> > - Merged via: r7, r5
> >
> > The code is looking at the the way r4 was merged and then prints the
> > revision numbers from the merge stack. With the patch we have enough
> > information on the merge stack to print "(reverse)" on each revision
> > that was subtractive, for example:
> > - Merged via: r7 (reverse), r5
> >
> > Another option would be to print both (and each revision only in the
> > line where it belongs).
> > - Reverse merged via: r7
> > - Merged via: r5
> > To do this one would have to loop through the merge_stack twice (at
> > least) or keep a reasonably long buffer to store the list of revisions
> > in case of a long stack.
> >
>
> Hmm.  In favour of the second option is that it doesn't require CLI-
> parsing tools to learn about new syntax, but only to learn about two
> existing syntaxes being able to appear concurrently.  With a little
> luck, some of them will DTRT.
>
> Is there anything the first option can represent that the second one
> can't?  Perhaps some tree-like cases (rN was a merge of both rFOO and
> rBAR), but those aren't representable in the plain output anyway.
>

> For a buffer, we have svn_stringbuf_t that automatically reallocates
> itself as needed; and in any case, a two-pass algorithm would be easy to
> implement and to read and performant (it'd still be O(N)).
>
> > Both these changes might mess it up for someone consuming the CLI
> > output and expecting to find one or the other of Merged via: or Reverse
> > merged via:, or expecting to find just a simple list of revisions. I
> > can't judge the risk/reward for changing this.
>
> Would it be accurate to say that the incumbent output is wrong,
> misleading, a bug?
>

It would at least be misleading, since it isn't possible to figure out if a
merge some way down the tree was a forward or reverse merge. But that would
be easy to discover when looking at the code or inspecting the merges more
closely.

If so, we could make the change, as a bugfix.  The release notes could
> point it out and point to the XML output as a more-stable alternative.
>
> If not, we could record the idea as a milestone=2.0 issue.
>

I have let this question soak for a few days hoping someone else with more
experience would chime in.

I'm leaning towards setting this as a milestone=2.0 issue and for the time
leaving it alone not to risk destabilising anything.

Have you reviewed the patch from a coding style perspective? I would
appreciate a +1 from someone before committing.

Kind regards,
Daniel

Re: [PATCH] Issue #4711

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Fri, 20 Aug 2021 10:30 +00:00:
> Den fre 20 aug. 2021 kl 12:11 skrev Daniel Shahaf <d....@daniel.shahaf.name>:
> > Daniel Sahlberg wrote on Thu, Aug 19, 2021 at 23:23:49 +0200:
> > >  [[[
> > ⋮
> > > ------------------------------------------------------------------------
> > > r4 | dsahlberg | 2021-08-19 21:58:28 +0200 (Thu, 19 Aug 2021) | 1 line
> > > Merged via: r7, r5
> > > ]]]
> > > As can be seen here, it is not visible in the plain text output that r7 was
> > > actually a reverse merge.
> > 
> > Hmm.  There's an interesting question of whether we can amend the output
> > to make that clear.  The CLI output is an API, and generally we try not
> > to change it unless the user opts in to new features (e.g., when
> > changelists were added, the CLI output for users who don't use
> > changelists wasn't changed), but there are exceptions (e.g., adding the
> > seventh column to `svn status` non-XML output).
> 
> The two options today are:
> - Reverse merged via: r7
> or:
> - Merged via: r7, r5
> 
> The code is looking at the the way r4 was merged and then prints the 
> revision numbers from the merge stack. With the patch we have enough 
> information on the merge stack to print "(reverse)" on each revision 
> that was subtractive, for example:
> - Merged via: r7 (reverse), r5
> 
> Another option would be to print both (and each revision only in the 
> line where it belongs).
> - Reverse merged via: r7
> - Merged via: r5
> To do this one would have to loop through the merge_stack twice (at 
> least) or keep a reasonably long buffer to store the list of revisions 
> in case of a long stack.
> 

Hmm.  In favour of the second option is that it doesn't require CLI-
parsing tools to learn about new syntax, but only to learn about two
existing syntaxes being able to appear concurrently.  With a little
luck, some of them will DTRT.

Is there anything the first option can represent that the second one
can't?  Perhaps some tree-like cases (rN was a merge of both rFOO and
rBAR), but those aren't representable in the plain output anyway.

For a buffer, we have svn_stringbuf_t that automatically reallocates
itself as needed; and in any case, a two-pass algorithm would be easy to
implement and to read and performant (it'd still be O(N)).

> Both these changes might mess it up for someone consuming the CLI 
> output and expecting to find one or the other of Merged via: or Reverse 
> merged via:, or expecting to find just a simple list of revisions. I 
> can't judge the risk/reward for changing this.

Would it be accurate to say that the incumbent output is wrong, misleading, a bug?

If so, we could make the change, as a bugfix.  The release notes could
point it out and point to the XML output as a more-stable alternative.

If not, we could record the idea as a milestone=2.0 issue.

Cheers,

Daniel

Re: [PATCH] Issue #4711

Posted by Daniel Sahlberg <da...@gmail.com>.
Den fre 20 aug. 2021 kl 12:11 skrev Daniel Shahaf <d....@daniel.shahaf.name>:

> Daniel Sahlberg wrote on Thu, Aug 19, 2021 at 23:23:49 +0200:
> > Current output. Again XML indented for readability. It is the same WC as
> > before, but with an additional revision 7 where I reverse merge what was
> > merged in r6:
>
> At some point it becomes a good idea to post your test data so others
> (including future you) can reproduce or tinker.  Forms include:
>
> - shell script producing the working copy
>

[[[
#!/bin/sh

P=`pwd`
rm -rf $P/repo/ $P/wc/

svnadmin create repo
svn co file://$P/repo/ wc
cd wc
mkdir a
svn add a
svn ci a -m 'add a'
svn copy file://$P/repo/a file://$P/repo/b -m 'copy a to b'
svn copy file://$P/repo/b file://$P/repo/c -m 'copy b to c'
svn up
echo "a foo" > a/foo
svn add a/foo
svn ci a -m 'add a/foo'
svn up
svn merge ^/a b
svn ci -m 'merge a to b'
svn up
svn merge ^/b c
svn ci -m 'merge b to c'
svn up
svn merge -r HEAD:3 .
svn ci -m 'reverse merge everything'
svn up
]]]


> - test suite patch producing the working copy
>

log_with_merge_history_and_search should be used as a basis.


> >  [[[
> ⋮
> > ------------------------------------------------------------------------
> > r4 | dsahlberg | 2021-08-19 21:58:28 +0200 (Thu, 19 Aug 2021) | 1 line
> > Merged via: r7, r5
> > ]]]
> > As can be seen here, it is not visible in the plain text output that r7
> was
> > actually a reverse merge.
>
> Hmm.  There's an interesting question of whether we can amend the output
> to make that clear.  The CLI output is an API, and generally we try not
> to change it unless the user opts in to new features (e.g., when
> changelists were added, the CLI output for users who don't use
> changelists wasn't changed), but there are exceptions (e.g., adding the
> seventh column to `svn status` non-XML output).
>

The two options today are:
- Reverse merged via: r7
or:
- Merged via: r7, r5

The code is looking at the the way r4 was merged and then prints the
revision numbers from the merge stack. With the patch we have enough
information on the merge stack to print "(reverse)" on each revision that
was subtractive, for example:
- Merged via: r7 (reverse), r5

Another option would be to print both (and each revision only in the line
where it belongs).
- Reverse merged via: r7
- Merged via: r5
To do this one would have to loop through the merge_stack twice (at least)
or keep a reasonably long buffer to store the list of revisions in case of
a long stack.

Both these changes might mess it up for someone consuming the CLI output
and expecting to find one or the other of Merged via: or Reverse merged
via:, or expecting to find just a simple list of revisions. I can't judge
the risk/reward for changing this.

Kind regards,
Daniel

Re: [PATCH] Issue #4711

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Thu, Aug 19, 2021 at 23:23:49 +0200:
> Current output. Again XML indented for readability. It is the same WC as
> before, but with an additional revision 7 where I reverse merge what was
> merged in r6:

At some point it becomes a good idea to post your test data so others
(including future you) can reproduce or tinker.  Forms include:

- shell script producing the working copy
- dump file
- tar.gz'd working copy
- test suite patch producing the working copy

>  [[[
⋮
> ------------------------------------------------------------------------
> r4 | dsahlberg | 2021-08-19 21:58:28 +0200 (Thu, 19 Aug 2021) | 1 line
> Merged via: r7, r5
> ]]]
> As can be seen here, it is not visible in the plain text output that r7 was
> actually a reverse merge.

Hmm.  There's an interesting question of whether we can amend the output
to make that clear.  The CLI output is an API, and generally we try not
to change it unless the user opts in to new features (e.g., when
changelists were added, the CLI output for users who don't use
changelists wasn't changed), but there are exceptions (e.g., adding the
seventh column to `svn status` non-XML output).

Cheers,

Daniel

Re: [PATCH] Issue #4711

Posted by Daniel Sahlberg <da...@gmail.com>.
Den tors 19 aug. 2021 kl 14:51 skrev Daniel Shahaf <d....@daniel.shahaf.name>:

> Daniel Sahlberg wrote on Wed, Aug 18, 2021 at 07:54:47 +0200:
> > Den tis 17 aug. 2021 kl 05:36 skrev Daniel Shahaf <
> d.s@daniel.shahaf.name>:
> > > subversion/svn/schema/log.rnc doesn't mention reverse-merge at all.
> > > (I was looking to see whether the attribute was declared mandatory or
> not.)
> > >
> >
> > So, let's add it. Since it is not present in the case where there are no
> > merges, it should be optional. See if I got this right (first time
> touching
> > XML RELAX NG schemas):
> > [[[
> > Index: subversion/svn/schema/log.rnc
> > ===================================================================
> > --- subversion/svn/schema/log.rnc       (revision 1892122)
> > +++ subversion/svn/schema/log.rnc       (working copy)
> > @@ -28,7 +28,8 @@
> >  logentry =
> >    element logentry { attlist.logentry, author?, date?, paths?, msg?,
> revprops?, logentry* }
> >  attlist.logentry &=
> > -  attribute revision { revnum.type }
> > +  attribute revision { revnum.type },
> > +  attribute reverse-merge { "true" | "false" }?
>
> I don't know.  I've not worked with .rnc schemata before.  Do you have
> a more specific question about this change?  Any particular aspect of it
> you aren't sure about?
>

No, not really. FWIW, I've tested log.rnc using trang and it seems to
validate and also produces a reasonable xsd schema.

I have committed as r1892455.


>
> >  ## Changed paths information.
> >  paths = element paths { path+ }
> > ]]]
> >
> >
> > >
> > > Could adding the attribute to r5 break anything?
> > >
> >
> > The attribute is already known (to those observing the existing output of
> > -g). Most XML parsers (that I have worked with) ignore unknown or
> > unexpected attributes so I think it wouldn't be a problem - in any case
> > they will have trouble with r4 (see my example two mails upthread) where
> it
> > WILL be present.
> >
> > I also don't think it will be a problem NOT to have it, since it doesn't
> > exist in non -g outputs.
> >
> > The larger question is "do we want it?". In other words: Is it a property
> > of r5 (which was filtered out, so it should be filtered out as well) or
> is
> > it a property necessary to reconstruct the path to r4.
>
> I think we should include it, because:
>
> - The API provides it.  The CLI be to the API as the plain log output is
>   to the XML output.
>
> - It adds information: absence does not imply the value would be "true",
>   nor that it would be "false".
>
> > > > We could change the merge_stack to store this as well if needed, but
> I
> > > > don't have a clear picture about the merge/reverse merge flow.
> > >
> > > That's a bit of a large question.  Handwavingly, merging rN from /foo
> to
> > > /bar "adds some diff" to /bar and records that in svn:mergeinfo;
> > > thereafter, reverse-merging the same revision from the same path will
> > > reverse both of these changes.  Note: if the reverse merge was
> committed
> > > as rM, _that_ revision can be (forward) merged to a third branch.  That
> > > commit would then add /bar:M and remove /foo:N from the third branch's
> > > mergeinfo.
> > >
> >
> > Out of time right now but I intend to come back to this and try to
> > construct a tree with this sequence.
>
> *nod*
>

When checking this closer, I agree with danielsh's conclusion in a separate
message (at 15:42 UTC) that the reverse-merge is a property of the parent
revision (r7 below, r6 in the previous example) and that it is necessary to
have it to properly reconstruct the path to r4. Thus we should not elide it
(even if we elide all the other properties of revisions that doesn't match
the search pattern).

Current output. Again XML indented for readability. It is the same WC as
before, but with an additional revision 7 where I reverse merge what was
merged in r6:
 [[[
dsahlberg@daniel-2020:~/temp/wc/c$ ~/svn_trunk_4711/subversion/svn/svn log
-g .
------------------------------------------------------------------------
r7 | dsahlberg | 2021-08-19 22:08:21 +0200 (Thu, 19 Aug 2021) | 1 line

reverse merge
------------------------------------------------------------------------
r5 | dsahlberg | 2021-08-19 21:58:29 +0200 (Thu, 19 Aug 2021) | 1 line
Reverse merged via: r7

merge a to b
------------------------------------------------------------------------
r4 | dsahlberg | 2021-08-19 21:58:28 +0200 (Thu, 19 Aug 2021) | 1 line
Merged via: r7, r5
]]]
As can be seen here, it is not visible in the plain text output that r7 was
actually a reverse merge.
[[[

add a/foo
------------------------------------------------------------------------
r6 | dsahlberg | 2021-08-19 21:58:29 +0200 (Thu, 19 Aug 2021) | 1 line

merge b to c
------------------------------------------------------------------------
r3 | dsahlberg | 2021-08-19 21:58:28 +0200 (Thu, 19 Aug 2021) | 1 line

copy b to c
------------------------------------------------------------------------
r2 | dsahlberg | 2021-08-19 21:58:28 +0200 (Thu, 19 Aug 2021) | 1 line

copy a to b
------------------------------------------------------------------------
r1 | dsahlberg | 2021-08-19 21:58:28 +0200 (Thu, 19 Aug 2021) | 1 line

add a
------------------------------------------------------------------------
dsahlberg@daniel-2020:~/temp/wc/c$ svn log -g --xml .
<?xml version="1.0" encoding="UTF-8"?>
<log>
  <logentry revision="7">
    <author>dsahlberg</author>
    <date>2021-08-19T20:08:21.276818Z</date>
    <msg>reverse merge</msg>
    <logentry reverse-merge="true" revision="5">
      <author>dsahlberg</author>
      <date>2021-08-19T19:58:29.079067Z</date>
      <msg>merge a to b</msg>
      <logentry reverse-merge="false" revision="4">
        <author>dsahlberg</author>
        <date>2021-08-19T19:58:28.948418Z</date>
        <msg>add a/foo</msg>
      </logentry>
    </logentry>
  </logentry>
  <logentry revision="6">
    <author>dsahlberg</author>
    <date>2021-08-19T19:58:29.210735Z</date>
    <msg>merge b to c</msg>
  </logentry>
  <logentry revision="3">
    <author>dsahlberg</author>
    <date>2021-08-19T19:58:28.845491Z</date>
    <msg>copy b to c</msg>
  </logentry>
  <logentry revision="2">
    <author>dsahlberg</author>
    <date>2021-08-19T19:58:28.800706Z</date>
    <msg>copy a to b</msg>
  </logentry>
    <logentry revision="1">
    <author>dsahlberg</author>
    <date>2021-08-19T19:58:28.742409Z</date>
    <msg>add a</msg>
  </logentry>
</log>
dsahlberg@daniel-2020:~/temp/wc/c$ ~/svn_trunk_4711/subversion/svn/svn log
-g --search foo .
------------------------------------------------------------------------
r4 | dsahlberg | 2021-08-19 21:58:28 +0200 (Thu, 19 Aug 2021) | 1 line
Merged via: r7, r5

add a/foo
------------------------------------------------------------------------
dsahlberg@daniel-2020:~/temp/wc/c$ ~/svn_trunk_4711/subversion/svn/svn log
-g --search foo --xml .
<?xml version="1.0" encoding="UTF-8"?>
<log>
  <logentry revision="7">
    <logentry reverse-merge="true" revision="5">
      <logentry reverse-merge="false" revision="4">
        <author>dsahlberg</author>
        <date>2021-08-19T19:58:28.948418Z</date>
        <msg>add a/foo</msg>
      </logentry>
    </logentry>
  </logentry>
</log>
dsahlberg@daniel-2020:~/temp/wc/c$
]]]

Current patch:
[[[
Index: subversion/svn/log-cmd.c
===================================================================
--- subversion/svn/log-cmd.c    (revision 1892122)
+++ subversion/svn/log-cmd.c    (working copy)
@@ -45,6 +45,12 @@

 #include "svn_private_config.h"

+typedef struct merge_stack {
+  svn_revnum_t rev;
+  svn_boolean_t emitted;
+  svn_boolean_t subtractive_merge;
+} merge_stack_t;
+

 /*** Code. ***/

@@ -355,10 +361,15 @@
     {
       if (log_entry->has_children)
         {
+         merge_stack_t ms;
+
           if (! lb->merge_stack)
-            lb->merge_stack = apr_array_make(lb->pool, 1,
sizeof(svn_revnum_t));
+            lb->merge_stack = apr_array_make(lb->pool, 1,
sizeof(merge_stack_t));

-          APR_ARRAY_PUSH(lb->merge_stack, svn_revnum_t) =
log_entry->revision;
+          ms.rev = log_entry->revision;
+          ms.emitted = FALSE;
+          ms.subtractive_merge = log_entry->subtractive_merge;
+          APR_ARRAY_PUSH(lb->merge_stack, merge_stack_t) = ms;
         }

       return SVN_NO_ERROR;
@@ -435,10 +446,10 @@
       iterpool = svn_pool_create(pool);
       for (i = 0; i < lb->merge_stack->nelts; i++)
         {
-          svn_revnum_t rev = APR_ARRAY_IDX(lb->merge_stack, i,
svn_revnum_t);
+          merge_stack_t ms = APR_ARRAY_IDX(lb->merge_stack, i,
merge_stack_t);

           svn_pool_clear(iterpool);
-          SVN_ERR(svn_cmdline_printf(iterpool, " r%ld%c", rev,
+          SVN_ERR(svn_cmdline_printf(iterpool, " r%ld%c", ms.rev,
                                      i == lb->merge_stack->nelts - 1 ?
                                                                   '\n' :
','));
         }
@@ -475,10 +486,15 @@

   if (log_entry->has_children)
     {
+      merge_stack_t ms;
+
       if (! lb->merge_stack)
-        lb->merge_stack = apr_array_make(lb->pool, 1,
sizeof(svn_revnum_t));
+        lb->merge_stack = apr_array_make(lb->pool, 1,
sizeof(merge_stack_t));

-      APR_ARRAY_PUSH(lb->merge_stack, svn_revnum_t) = log_entry->revision;
+      ms.rev = log_entry->revision;
+      ms.emitted = FALSE;
+      ms.subtractive_merge = log_entry->subtractive_merge;
+      APR_ARRAY_PUSH(lb->merge_stack, merge_stack_t) = ms;
     }

   return SVN_NO_ERROR;
@@ -533,6 +549,7 @@
   const char *author;
   const char *date;
   const char *message;
+  int i;

   if (lb->ctx->cancel_func)
     SVN_ERR(lb->ctx->cancel_func(lb->ctx->cancel_baton));
@@ -544,11 +561,15 @@

   if (! SVN_IS_VALID_REVNUM(log_entry->revision))
     {
-      svn_xml_make_close_tag(&sb, pool, "logentry");
-      SVN_ERR(svn_cl__error_checked_fputs(sb->data, stdout));
       if (lb->merge_stack)
-        apr_array_pop(lb->merge_stack);
-
+        {
+          if (lb->merge_stack->nelts > 0 &&
(APR_ARRAY_IDX(lb->merge_stack, lb->merge_stack->nelts-1,
merge_stack_t).emitted))
+            {
+              svn_xml_make_close_tag(&sb, pool, "logentry");
+              SVN_ERR(svn_cl__error_checked_fputs(sb->data, stdout));
+            }
+          apr_array_pop(lb->merge_stack);
+        }
       return SVN_NO_ERROR;
     }

@@ -559,15 +580,44 @@
     {
       if (log_entry->has_children)
         {
+          merge_stack_t ms;
           if (! lb->merge_stack)
-            lb->merge_stack = apr_array_make(lb->pool, 1,
sizeof(svn_revnum_t));
+            lb->merge_stack = apr_array_make(lb->pool, 1,
sizeof(merge_stack_t));

-          APR_ARRAY_PUSH(lb->merge_stack, svn_revnum_t) =
log_entry->revision;
+          ms.rev = log_entry->revision;
+          ms.emitted = FALSE;
+          ms.subtractive_merge = log_entry->subtractive_merge;
+          APR_ARRAY_PUSH(lb->merge_stack, merge_stack_t) = ms;
         }

       return SVN_NO_ERROR;
     }
-
+  else if (lb->search_patterns && lb->merge_stack)
+    {
+    /* match_search_patterns returned true */
+    for (i = 0; i < lb->merge_stack->nelts; i++)
+      {
+        merge_stack_t ms = APR_ARRAY_IDX(lb->merge_stack, i,
merge_stack_t);
+        if (!ms.emitted)
+          {
+            revstr = apr_psprintf(pool, "%ld", ms.rev);
+            if (i == 0)
+              {
+                svn_xml_make_open_tag(&sb, pool, svn_xml_normal,
"logentry",
+                                      "revision", revstr, SVN_VA_NULL);
+              }
+            else
+              {
+                svn_xml_make_open_tag(&sb, pool, svn_xml_normal,
"logentry",
+                                      "revision", revstr, "reverse-merge",
+                                      ms.subtractive_merge ? "true" :
"false",
+                                      SVN_VA_NULL);
+              }
+            APR_ARRAY_IDX(lb->merge_stack, i, merge_stack_t).emitted =
TRUE;
+          }
+      }
+  }
+
   if (author)
     author = svn_xml_fuzzy_escape(author, pool);
   if (date)
@@ -606,7 +656,6 @@
   if (log_entry->changed_paths2)
     {
       apr_array_header_t *sorted_paths;
-      int i;

       /* <paths> */
       svn_xml_make_open_tag(&sb, pool, svn_xml_normal, "paths",
@@ -684,10 +733,15 @@

   if (log_entry->has_children)
     {
+      merge_stack_t ms;
+
       if (! lb->merge_stack)
-        lb->merge_stack = apr_array_make(lb->pool, 1,
sizeof(svn_revnum_t));
+        lb->merge_stack = apr_array_make(lb->pool, 1,
sizeof(merge_stack_t));

-      APR_ARRAY_PUSH(lb->merge_stack, svn_revnum_t) = log_entry->revision;
+      ms.rev = log_entry->revision;
+      ms.emitted = TRUE;
+      ms.subtractive_merge = log_entry->subtractive_merge;
+      APR_ARRAY_PUSH(lb->merge_stack, merge_stack_t) = ms;
     }
   else
     svn_xml_make_close_tag(&sb, pool, "logentry");
Index: subversion/tests/cmdline/log_tests.py
===================================================================
--- subversion/tests/cmdline/log_tests.py       (revision 1892456)
+++ subversion/tests/cmdline/log_tests.py       (working copy)
@@ -2779,7 +2779,6 @@
                                      '',
                                      '-q', '-c', '1-2')

-@XFail()
 @Issue(4711)
 def log_with_merge_history_and_search(sbox):
   "log --use-merge-history --search"
]]]

Kind regards,
Daniel

Re: [PATCH] Issue #4711

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Wed, Aug 18, 2021 at 07:54:47 +0200:
> Den tis 17 aug. 2021 kl 05:36 skrev Daniel Shahaf <d....@daniel.shahaf.name>:
> 
> > Daniel Sahlberg wrote on Sun, Aug 15, 2021 at 22:11:29 +0200:
> > > Den lör 14 aug. 2021 kl 13:21 skrev Daniel Shahaf <
> > d.s@daniel.shahaf.name>:
> > > > Specifically, I was pointing out site/tools/upcoming.py as a tool that
> > > > synthesizes plain output from XML output.  Shouldn't any change to XML
> > > > or plain output retain the possibility to synthesize plain output from
> > > > XML output?  It's not clear to me whether this is the case in
> > trunk@HEAD
> > > > for «log -g», nor whether it would be the case under the proposal.
> > > >
> > >
> > > Oh, sorry, misunderstanding. Agree that the XML should output a superset
> > of
> > > the information in plaintext.
> > >
> > > In trunk@HEAD this is not the case for the bug under discussion (where
> > > --search elide the opening tags for r6 and r5 in the example above) but
> > it
> > > is true in the other cases.
> > >
> > > With the proposed patch it is true in all cases (well, unless there are
> > > bugs).
> >
> > *nod*
> >
> > > There is one thing I'm not sure about. There is an attribute
> > reverse-merge,
> > > it corresponds to log_entry->subtractive_merge. With --search this
> > > attribute is elided for the revisions that don't match the search
> > pattern.
> > > This is the same for the plaintext log (it only look for it in r4 to
> > decide
> > > to print either "Reverse merged via" or "Merged via") and the property is
> > > present in r4 so the XML is enough to synthesize the plaintext output. Is
> > > there any point in having this information also for r5?
> >
> > Sure: provide to cmdline XML consumers data that the C API provides.
> > It's similar to the relation between plain and XML log outputs, in that
> > the former is a subset of the latter.
> >
> > subversion/svn/schema/log.rnc doesn't mention reverse-merge at all.
> > (I was looking to see whether the attribute was declared mandatory or not.)
> >
> 
> So, let's add it. Since it is not present in the case where there are no
> merges, it should be optional. See if I got this right (first time touching
> XML RELAX NG schemas):
> [[[
> Index: subversion/svn/schema/log.rnc
> ===================================================================
> --- subversion/svn/schema/log.rnc       (revision 1892122)
> +++ subversion/svn/schema/log.rnc       (working copy)
> @@ -28,7 +28,8 @@
>  logentry =
>    element logentry { attlist.logentry, author?, date?, paths?, msg?, revprops?, logentry* }
>  attlist.logentry &=
> -  attribute revision { revnum.type }
> +  attribute revision { revnum.type },
> +  attribute reverse-merge { "true" | "false" }?

I don't know.  I've not worked with .rnc schemata before.  Do you have
a more specific question about this change?  Any particular aspect of it
you aren't sure about?

>  ## Changed paths information.
>  paths = element paths { path+ }
> ]]]
> 
> 
> >
> > Could adding the attribute to r5 break anything?
> >
> 
> The attribute is already known (to those observing the existing output of
> -g). Most XML parsers (that I have worked with) ignore unknown or
> unexpected attributes so I think it wouldn't be a problem - in any case
> they will have trouble with r4 (see my example two mails upthread) where it
> WILL be present.
> 
> I also don't think it will be a problem NOT to have it, since it doesn't
> exist in non -g outputs.
> 
> The larger question is "do we want it?". In other words: Is it a property
> of r5 (which was filtered out, so it should be filtered out as well) or is
> it a property necessary to reconstruct the path to r4.

I think we should include it, because:

- The API provides it.  The CLI be to the API as the plain log output is
  to the XML output.

- It adds information: absence does not imply the value would be "true",
  nor that it would be "false".

> > > We could change the merge_stack to store this as well if needed, but I
> > > don't have a clear picture about the merge/reverse merge flow.
> >
> > That's a bit of a large question.  Handwavingly, merging rN from /foo to
> > /bar "adds some diff" to /bar and records that in svn:mergeinfo;
> > thereafter, reverse-merging the same revision from the same path will
> > reverse both of these changes.  Note: if the reverse merge was committed
> > as rM, _that_ revision can be (forward) merged to a third branch.  That
> > commit would then add /bar:M and remove /foo:N from the third branch's
> > mergeinfo.
> >
> 
> Out of time right now but I intend to come back to this and try to
> construct a tree with this sequence.

*nod*

> > Well, so long as they don't ask us to do a _Two Ronnies_ skit at the
> > next hackathon :-)
> >
> 
> Someone volonteer to host a new hackathon and I would probably consider
> doing a Two Ronnies skit.
> (Darn, just realised I might have banned myself for life).

Hey, don't jump to conclusions.  Knowing our fellow developers, they'll
set the invariant lower, at "Daniel attends nand Daniel attends".

Cheers,

Daniel


> Kind regards,
> Daniel

Re: [PATCH] Issue #4711

Posted by Daniel Sahlberg <da...@gmail.com>.
Den tis 17 aug. 2021 kl 05:36 skrev Daniel Shahaf <d....@daniel.shahaf.name>:

> Daniel Sahlberg wrote on Sun, Aug 15, 2021 at 22:11:29 +0200:
> > Den lör 14 aug. 2021 kl 13:21 skrev Daniel Shahaf <
> d.s@daniel.shahaf.name>:
> > > Specifically, I was pointing out site/tools/upcoming.py as a tool that
> > > synthesizes plain output from XML output.  Shouldn't any change to XML
> > > or plain output retain the possibility to synthesize plain output from
> > > XML output?  It's not clear to me whether this is the case in
> trunk@HEAD
> > > for «log -g», nor whether it would be the case under the proposal.
> > >
> >
> > Oh, sorry, misunderstanding. Agree that the XML should output a superset
> of
> > the information in plaintext.
> >
> > In trunk@HEAD this is not the case for the bug under discussion (where
> > --search elide the opening tags for r6 and r5 in the example above) but
> it
> > is true in the other cases.
> >
> > With the proposed patch it is true in all cases (well, unless there are
> > bugs).
>
> *nod*
>
> > There is one thing I'm not sure about. There is an attribute
> reverse-merge,
> > it corresponds to log_entry->subtractive_merge. With --search this
> > attribute is elided for the revisions that don't match the search
> pattern.
> > This is the same for the plaintext log (it only look for it in r4 to
> decide
> > to print either "Reverse merged via" or "Merged via") and the property is
> > present in r4 so the XML is enough to synthesize the plaintext output. Is
> > there any point in having this information also for r5?
>
> Sure: provide to cmdline XML consumers data that the C API provides.
> It's similar to the relation between plain and XML log outputs, in that
> the former is a subset of the latter.
>
> subversion/svn/schema/log.rnc doesn't mention reverse-merge at all.
> (I was looking to see whether the attribute was declared mandatory or not.)
>

So, let's add it. Since it is not present in the case where there are no
merges, it should be optional. See if I got this right (first time touching
XML RELAX NG schemas):
[[[
Index: subversion/svn/schema/log.rnc
===================================================================
--- subversion/svn/schema/log.rnc       (revision 1892122)
+++ subversion/svn/schema/log.rnc       (working copy)
@@ -28,7 +28,8 @@
 logentry =
   element logentry { attlist.logentry, author?, date?, paths?, msg?,
revprops?, logentry* }
 attlist.logentry &=
-  attribute revision { revnum.type }
+  attribute revision { revnum.type },
+  attribute reverse-merge { "true" | "false" }?

 ## Changed paths information.
 paths = element paths { path+ }
]]]


>
> Could adding the attribute to r5 break anything?
>

The attribute is already known (to those observing the existing output of
-g). Most XML parsers (that I have worked with) ignore unknown or
unexpected attributes so I think it wouldn't be a problem - in any case
they will have trouble with r4 (see my example two mails upthread) where it
WILL be present.

I also don't think it will be a problem NOT to have it, since it doesn't
exist in non -g outputs.

The larger question is "do we want it?". In other words: Is it a property
of r5 (which was filtered out, so it should be filtered out as well) or is
it a property necessary to reconstruct the path to r4.


> > We could change the merge_stack to store this as well if needed, but I
> > don't have a clear picture about the merge/reverse merge flow.
>
> That's a bit of a large question.  Handwavingly, merging rN from /foo to
> /bar "adds some diff" to /bar and records that in svn:mergeinfo;
> thereafter, reverse-merging the same revision from the same path will
> reverse both of these changes.  Note: if the reverse merge was committed
> as rM, _that_ revision can be (forward) merged to a third branch.  That
> commit would then add /bar:M and remove /foo:N from the third branch's
> mergeinfo.
>

Out of time right now but I intend to come back to this and try to
construct a tree with this sequence.


> Well, so long as they don't ask us to do a _Two Ronnies_ skit at the
> next hackathon :-)
>

Someone volonteer to host a new hackathon and I would probably consider
doing a Two Ronnies skit.
(Darn, just realised I might have banned myself for life).

Kind regards,
Daniel

Re: [PATCH] Issue #4711

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Sun, Aug 15, 2021 at 22:11:29 +0200:
> Den lör 14 aug. 2021 kl 13:21 skrev Daniel Shahaf <d....@daniel.shahaf.name>:
> 
> > Daniel Sahlberg wrote on Fri, Aug 13, 2021 at 16:55:38 +0200:
> > > danielsh had two comments on IRC,
> > > https://colabti.org/irclogger/irclogger_log/svn-dev?date=2021-08-09,
> > which
> > > I will try to address here. I'm taking the liberty of copying from the
> > IRC
> > > log to get some context.
> >
> > Thanks, and sorry for not posting them on the list as I planned to;
> > suffice it to say my Wednesday didn't go according to plan.
> >
> 
> NP, that's what I thought. Hope you didn't mind I copied your questions
> from IRC to shortciruit one round at the mailinglist.

Thanks for doing that.

> > Okay, and what about the example above with two levels of nesting?  How
> > does that compare?  Do XML and plain output modes implement filtering
> > the same way (e.g., do both of them elide details of r11)?
> >
> 
> Again, sorry for not having the original WC. This is my current test WC: a,
> b and c are separate folders directly under the root. The XML output has
> been indented for readability purposes. The examples are done with a
> patched svn using my second approach
> [[[
⋮
> ]]]
> 
> Yes, they implement filtering the same way, only the revision that actually
> match is printed. The merge history is either emitted as part of the tree
> or in the Merged via: line (see log-cmd.c, lines 425 to 446, that part of
> the code mirrors the loop to over the merge_stack to create opening tags
> for all revisions !emitted).

Thanks for the verbose example.  Sounds good.

> > Specifically, I was pointing out site/tools/upcoming.py as a tool that
> > synthesizes plain output from XML output.  Shouldn't any change to XML
> > or plain output retain the possibility to synthesize plain output from
> > XML output?  It's not clear to me whether this is the case in trunk@HEAD
> > for «log -g», nor whether it would be the case under the proposal.
> >
> 
> Oh, sorry, misunderstanding. Agree that the XML should output a superset of
> the information in plaintext.
> 
> In trunk@HEAD this is not the case for the bug under discussion (where
> --search elide the opening tags for r6 and r5 in the example above) but it
> is true in the other cases.
> 
> With the proposed patch it is true in all cases (well, unless there are
> bugs).

*nod*

> There is one thing I'm not sure about. There is an attribute reverse-merge,
> it corresponds to log_entry->subtractive_merge. With --search this
> attribute is elided for the revisions that don't match the search pattern.
> This is the same for the plaintext log (it only look for it in r4 to decide
> to print either "Reverse merged via" or "Merged via") and the property is
> present in r4 so the XML is enough to synthesize the plaintext output. Is
> there any point in having this information also for r5?

Sure: provide to cmdline XML consumers data that the C API provides.
It's similar to the relation between plain and XML log outputs, in that
the former is a subset of the latter.

subversion/svn/schema/log.rnc doesn't mention reverse-merge at all.
(I was looking to see whether the attribute was declared mandatory or not.)

Could adding the attribute to r5 break anything?

> We could change the merge_stack to store this as well if needed, but I
> don't have a clear picture about the merge/reverse merge flow.

That's a bit of a large question.  Handwavingly, merging rN from /foo to
/bar "adds some diff" to /bar and records that in svn:mergeinfo;
thereafter, reverse-merging the same revision from the same path will
reverse both of these changes.  Note: if the reverse merge was committed
as rM, _that_ revision can be (forward) merged to a third branch.  That
commit would then add /bar:M and remove /foo:N from the third branch's
mergeinfo.

> > Thanks, Daniel.
> >
> > Daniel
> >
> 
> I hope this answers your questions!
> 
> Thanks, Daniel.
> 
> Daniel
> 
> (Sticking to your ending of the mail. Why change a winning concept :-)
> /dsahlberg)

Well, so long as they don't ask us to do a _Two Ronnies_ skit at the
next hackathon :-)

Cheers,

Daniel

Re: [PATCH] Issue #4711

Posted by Daniel Sahlberg <da...@gmail.com>.
Den lör 14 aug. 2021 kl 13:21 skrev Daniel Shahaf <d....@daniel.shahaf.name>:

> Daniel Sahlberg wrote on Fri, Aug 13, 2021 at 16:55:38 +0200:
> > danielsh had two comments on IRC,
> > https://colabti.org/irclogger/irclogger_log/svn-dev?date=2021-08-09,
> which
> > I will try to address here. I'm taking the liberty of copying from the
> IRC
> > log to get some context.
>
> Thanks, and sorry for not posting them on the list as I planned to;
> suffice it to say my Wednesday didn't go according to plan.
>

NP, that's what I thought. Hope you didn't mind I copied your questions
from IRC to shortciruit one round at the mailinglist.


> > Den mån 9 aug. 2021 kl 03:12 skrev Daniel Sahlberg <
> > daniel.l.sahlberg@gmail.com>:
> > > I'm proposing to keep the XML tree intact, but filter out the
> > > information that doesn't match the search pattern. If searching for
> "add
> > > 1 file", the result would be as follows:
> > >
> > > [[[
> > >      <logentry revision="12">
> > >          <logentry revision="11">
> > >              <logentry revision="10" reverse-merge="false">
> > >                  <author>dsahlberg</author>
> > > <date>2021-08-08T19:29:10.860725Z</date>
> > >                  <msg>add 1 file</msg>
> > >              </logentry>
> > >          </logentry>
> > >      </logentry>
> > > ]]]
> > >
> >
> > danielsh: 1. Re your "I'm proposing", how would with/without XML compare
> to
> > each other? Cf. site/tools/ compiling XML output into plain output.
> >
> > (It seems I didn't keep my test repo, but I hope the examples should be
> > readable anyway).
> >
> > For readability, I have reformatted the XML with proper indentation:
> > [[[
> > dsahlberg@daniel-2020:~/temp/wc/c$ ~/svn_trunk_4711/subversion/svn/svn
> log
> > --use-merge-history --search 'merge a'
> > ------------------------------------------------------------------------
> > r5 | dsahlberg | 2021-08-09 02:31:34 +0200 (Mon, 09 Aug 2021) | 1 line
> > Merged via: r6
> >
> > merge a to b
> > ------------------------------------------------------------------------
> > dsahlberg@daniel-2020:~/temp/wc/c$ ~/svn_trunk_4711/subversion/svn/svn
> log
> > --use-merge-history --search 'merge a' --xml
> > <?xml version="1.0" encoding="UTF-8"?>
> > <log>
> >     <logentry revision="6">
> >         <logentry reverse-merge="false" revision="5">
> >             <author>dsahlberg</author>
> >             <date>2021-08-09T00:31:34.969440Z</date>
> >             <msg>merge a to b</msg>
> >         </logentry>
> >     </logentry>
> > </log>
> > ]]]
> >
> > In plain text output the inheritance is show by the "Merged via:" line,
> in
> > XML is found by examining the tree.
>
> Okay, and what about the example above with two levels of nesting?  How
> does that compare?  Do XML and plain output modes implement filtering
> the same way (e.g., do both of them elide details of r11)?
>

Again, sorry for not having the original WC. This is my current test WC: a,
b and c are separate folders directly under the root. The XML output has
been indented for readability purposes. The examples are done with a
patched svn using my second approach
[[[
dsahlberg@daniel-2020:~/temp/wc/c$ ~/svn_trunk/subversion/svn/svn log -g
------------------------------------------------------------------------
r6 | dsahlberg | 2021-08-09 02:31:35 +0200 (Mon, 09 Aug 2021) | 1 line

merge b to c
------------------------------------------------------------------------
r5 | dsahlberg | 2021-08-09 02:31:34 +0200 (Mon, 09 Aug 2021) | 1 line
Merged via: r6

merge a to b
------------------------------------------------------------------------
r4 | dsahlberg | 2021-08-09 02:31:34 +0200 (Mon, 09 Aug 2021) | 1 line
Merged via: r6, r5

add a/foo
------------------------------------------------------------------------
r3 | dsahlberg | 2021-08-09 02:31:34 +0200 (Mon, 09 Aug 2021) | 1 line

copy b to c
------------------------------------------------------------------------
r2 | dsahlberg | 2021-08-09 02:31:34 +0200 (Mon, 09 Aug 2021) | 1 line

copy a to b
------------------------------------------------------------------------
r1 | dsahlberg | 2021-08-09 02:31:34 +0200 (Mon, 09 Aug 2021) | 1 line

add a
------------------------------------------------------------------------
dsahlberg@daniel-2020:~/temp/wc/c$ ~/svn_trunk/subversion/svn/svn log -g
--xml
<?xml version="1.0" encoding="UTF-8"?>
<log>
    <logentry revision="6">
        <author>dsahlberg</author>
        <date>2021-08-09T00:31:35.068294Z</date>
        <msg>merge b to c</msg>
        <logentry revision="5" reverse-merge="false">
            <author>dsahlberg</author>
            <date>2021-08-09T00:31:34.969440Z</date>
            <msg>merge a to b</msg>
            <logentry revision="4" reverse-merge="false">
                <author>dsahlberg</author>
                <date>2021-08-09T00:31:34.870273Z</date>
                <msg>add a/foo</msg>
            </logentry>
        </logentry>
    </logentry>
    <logentry revision="3">
        <author>dsahlberg</author>
        <date>2021-08-09T00:31:34.798928Z</date>
    <msg>copy b to c</msg>
    </logentry>
    <logentry revision="2">
        <author>dsahlberg</author>
        <date>2021-08-09T00:31:34.767518Z</date>
        <msg>copy a to b</msg>
    </logentry>
    <logentry revision="1">
        <author>dsahlberg</author>
        <date>2021-08-09T00:31:34.725920Z</date>
        <msg>add a</msg>
    </logentry>
</log>
dsahlberg@daniel-2020:~/temp/wc/c$ ~/svn_trunk/subversion/svn/svn log -g
--search foo
------------------------------------------------------------------------
r4 | dsahlberg | 2021-08-09 02:31:34 +0200 (Mon, 09 Aug 2021) | 1 line
Merged via: r6, r5

add a/foo
------------------------------------------------------------------------
dsahlberg@daniel-2020:~/temp/wc/c$ ~/svn_trunk/subversion/svn/svn log -g
--search foo --xml
<?xml version="1.0" encoding="UTF-8"?>
<log>
    <logentry revision="6">
        <logentry revision="5">
            <logentry reverse-merge="false" revision="4">
                <author>dsahlberg</author>
                <date>2021-08-09T00:31:34.870273Z</date>
                <msg>add a/foo</msg>
            </logentry>
        </logentry>
    </logentry>
</log>
dsahlberg@daniel-2020:~/temp/wc/c$ ~/svn_trunk/subversion/svn/svn log -g
--search 'no'
------------------------------------------------------------------------
dsahlberg@daniel-2020:~/temp/wc/c$ ~/svn_trunk/subversion/svn/svn log -g
--xml --search 'no'
<?xml version="1.0" encoding="UTF-8"?>
<log>
</log>
dsahlberg@daniel-2020:~/temp/wc/c$
]]]

Yes, they implement filtering the same way, only the revision that actually
match is printed. The merge history is either emitted as part of the tree
or in the Merged via: line (see log-cmd.c, lines 425 to 446, that part of
the code mirrors the loop to over the merge_stack to create opening tags
for all revisions !emitted).

> I checked site/tools/upcoming.py. It doesn't use "-g" (which is
> responsible
> > for generating the recursive tree) so it expects all logentry elements to
> > be children of <log>, ie, it would not consider revision 5 above. Any
> tool
> > that is using "-g" should already be able to traverse a tree with
> logentry
> > elements that are children of other logentry elements (which could be
> > children of other logentry elements which ...).
>
> Sure, tools that don't use -g ignore nested <logentry>.  No question
> about that.  My question was a bit different, though.  In context of
> your proposal:
>
> > > I'm proposing to keep the XML tree intact, but filter out the
> > > information that doesn't match the search pattern.
>
> The proposal does not discuss non-XML output at all.  However, the XML
> and plain output are literally two different output syntaxes of the same
> subcommand, so I was looking for a comment on whether the proposed
> change does or does not align with the output of the same command when
> «--xml» is removed.
>
> Specifically, I was pointing out site/tools/upcoming.py as a tool that
> synthesizes plain output from XML output.  Shouldn't any change to XML
> or plain output retain the possibility to synthesize plain output from
> XML output?  It's not clear to me whether this is the case in trunk@HEAD
> for «log -g», nor whether it would be the case under the proposal.
>

Oh, sorry, misunderstanding. Agree that the XML should output a superset of
the information in plaintext.

In trunk@HEAD this is not the case for the bug under discussion (where
--search elide the opening tags for r6 and r5 in the example above) but it
is true in the other cases.

With the proposed patch it is true in all cases (well, unless there are
bugs).

There is one thing I'm not sure about. There is an attribute reverse-merge,
it corresponds to log_entry->subtractive_merge. With --search this
attribute is elided for the revisions that don't match the search pattern.
This is the same for the plaintext log (it only look for it in r4 to decide
to print either "Reverse merged via" or "Merged via") and the property is
present in r4 so the XML is enough to synthesize the plaintext output. Is
there any point in having this information also for r5? We could change the
merge_stack to store this as well if needed, but I don't have a clear
picture about the merge/reverse merge flow.

Thanks, Daniel.
>
> Daniel
>

I hope this answers your questions!

Thanks, Daniel.

Daniel

(Sticking to your ending of the mail. Why change a winning concept :-)
/dsahlberg)

Re: [PATCH] Issue #4711

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Fri, Aug 13, 2021 at 16:55:38 +0200:
> danielsh had two comments on IRC,
> https://colabti.org/irclogger/irclogger_log/svn-dev?date=2021-08-09, which
> I will try to address here. I'm taking the liberty of copying from the IRC
> log to get some context.

Thanks, and sorry for not posting them on the list as I planned to;
suffice it to say my Wednesday didn't go according to plan.

> Den mån 9 aug. 2021 kl 03:12 skrev Daniel Sahlberg <
> daniel.l.sahlberg@gmail.com>:
> > I'm proposing to keep the XML tree intact, but filter out the
> > information that doesn't match the search pattern. If searching for "add
> > 1 file", the result would be as follows:
> >
> > [[[
> >      <logentry revision="12">
> >          <logentry revision="11">
> >              <logentry revision="10" reverse-merge="false">
> >                  <author>dsahlberg</author>
> > <date>2021-08-08T19:29:10.860725Z</date>
> >                  <msg>add 1 file</msg>
> >              </logentry>
> >          </logentry>
> >      </logentry>
> > ]]]
> >
> 
> danielsh: 1. Re your "I'm proposing", how would with/without XML compare to
> each other? Cf. site/tools/ compiling XML output into plain output.
> 
> (It seems I didn't keep my test repo, but I hope the examples should be
> readable anyway).
> 
> For readability, I have reformatted the XML with proper indentation:
> [[[
> dsahlberg@daniel-2020:~/temp/wc/c$ ~/svn_trunk_4711/subversion/svn/svn log
> --use-merge-history --search 'merge a'
> ------------------------------------------------------------------------
> r5 | dsahlberg | 2021-08-09 02:31:34 +0200 (Mon, 09 Aug 2021) | 1 line
> Merged via: r6
> 
> merge a to b
> ------------------------------------------------------------------------
> dsahlberg@daniel-2020:~/temp/wc/c$ ~/svn_trunk_4711/subversion/svn/svn log
> --use-merge-history --search 'merge a' --xml
> <?xml version="1.0" encoding="UTF-8"?>
> <log>
>     <logentry revision="6">
>         <logentry reverse-merge="false" revision="5">
>             <author>dsahlberg</author>
>             <date>2021-08-09T00:31:34.969440Z</date>
>             <msg>merge a to b</msg>
>         </logentry>
>     </logentry>
> </log>
> ]]]
> 
> In plain text output the inheritance is show by the "Merged via:" line, in
> XML is found by examining the tree.

Okay, and what about the example above with two levels of nesting?  How
does that compare?  Do XML and plain output modes implement filtering
the same way (e.g., do both of them elide details of r11)?

> I checked site/tools/upcoming.py. It doesn't use "-g" (which is responsible
> for generating the recursive tree) so it expects all logentry elements to
> be children of <log>, ie, it would not consider revision 5 above. Any tool
> that is using "-g" should already be able to traverse a tree with logentry
> elements that are children of other logentry elements (which could be
> children of other logentry elements which ...).

Sure, tools that don't use -g ignore nested <logentry>.  No question
about that.  My question was a bit different, though.  In context of
your proposal:

> > I'm proposing to keep the XML tree intact, but filter out the
> > information that doesn't match the search pattern.

The proposal does not discuss non-XML output at all.  However, the XML
and plain output are literally two different output syntaxes of the same
subcommand, so I was looking for a comment on whether the proposed
change does or does not align with the output of the same command when
«--xml» is removed.

Specifically, I was pointing out site/tools/upcoming.py as a tool that
synthesizes plain output from XML output.  Shouldn't any change to XML
or plain output retain the possibility to synthesize plain output from
XML output?  It's not clear to me whether this is the case in trunk@HEAD
for «log -g», nor whether it would be the case under the proposal.

> > I have run the testsuite with both patch versions without any problems.
> > (I have obviously removed XFail() from the test for #4711 in log_tests.py).
> >
> 
> danielsh: 2. Wondering about a new svn_stream_t type that wraps another
> stream, caches writes until svn_stream_flush(), and has an
> undo_last_write() function
> danielsh: A 2-tuple struct may well be better :-)
> 
> There is still the need to store "state" somewhere to know if we should
> flush() or undo(). Pseudo call-sequence (each line starting with - is a
> call to the receiver), I'm using the same WC as above with an additional
> revision 7. The calls with r-1 is the invalid revision number used to
> generate the </logentry> closing tag after evaluating the merge history
> (the -g).
> 
> - receiver r7
>   ... match_search_patterns succeed. We want this revision. Output
> contents. No children so output the closing tag immediately. Flush().
> - receiver r6
>   ... match_search_patterns fail, so we don't want this revision. It has
> children so output anyway (but possibly ignoring <msg> etc.).
> - receiver r5 (because of -g)
>   ... match_search_patterns succeed, so we want this revision. Output
> contents. No children so output the closing tag immediately. Flush().
> - receiver -1 (this was the end of the merge history)
>   ... Output closing tag. Now we need to decide if to Flush() or Undo()
> depending on if anything was emitted since r6. (The contents of r5 has
> already been flush()ed but we also need to flush the </logentry> tag).
> The end [riding off into the sunset].
> 
> The only way I can come up with is to store this information in the
> merge_stack.
> 
> That being said, I'm not opposed to use an undo()-able stream but I don't
> think it is enough by itself.

Thanks.  I don't completely follow why we'd need to keep additional
state to decide whether to flush() or undo() — I thought (perhaps
incorrectly) that the stream's internal queued writes would be all
that's needed — but you do, so let's consider this point resolved.

Thanks, Daniel.

Daniel

Re: [PATCH] Issue #4711

Posted by Daniel Sahlberg <da...@gmail.com>.
danielsh had two comments on IRC,
https://colabti.org/irclogger/irclogger_log/svn-dev?date=2021-08-09, which
I will try to address here. I'm taking the liberty of copying from the IRC
log to get some context.

Den mån 9 aug. 2021 kl 03:12 skrev Daniel Sahlberg <
daniel.l.sahlberg@gmail.com>:

> Hi,
>
> I've been researching issue #4711 [1] and, as requested by danielsh in
> JIRA, I'm turning to dev@ to discuss a possible solution.
>
> As already found by danielsh, the problem lies in the filtering code in
> svn_cl__log_entry_receiver_xml. If the log entry doesn't match the
> search pattern, the code will return without emitting any opening tag.
> However when the library calls back to close a tag, it is done without
> checking if the opening tag was emitted or not.
>
> Without filtering, the tree is recursive. I have made a test case where
> I have three branches, a, b and c. I've made commits to a, which I have
> subsequently merged to b and then from b to c. The XML is as follows:
>
> [[[
>      <logentry revision="12">
>          <author>dsahlberg</author>
>          <date>2021-08-08T19:29:38.411358Z</date>
>          <msg>merge b to c</msg>
>          <logentry reverse-merge="false" revision="11">
>              <author>dsahlberg</author>
>              <date>2021-08-08T19:29:26.563564Z</date>
>              <msg>merge a to b</msg>
>              <logentry revision="10" reverse-merge="false">
>                  <author>dsahlberg</author>
> <date>2021-08-08T19:29:10.860725Z</date>
>                  <msg>add 1 file</msg>
>              </logentry>
>          </logentry>
>      </logentry>
> ]]]
>
> I'm proposing to keep the XML tree intact, but filter out the
> information that doesn't match the search pattern. If searching for "add
> 1 file", the result would be as follows:
>
> [[[
>      <logentry revision="12">
>          <logentry revision="11">
>              <logentry revision="10" reverse-merge="false">
>                  <author>dsahlberg</author>
> <date>2021-08-08T19:29:10.860725Z</date>
>                  <msg>add 1 file</msg>
>              </logentry>
>          </logentry>
>      </logentry>
> ]]]
>

danielsh: 1. Re your "I'm proposing", how would with/without XML compare to
each other? Cf. site/tools/ compiling XML output into plain output.

(It seems I didn't keep my test repo, but I hope the examples should be
readable anyway).

For readability, I have reformatted the XML with proper indentation:
[[[
dsahlberg@daniel-2020:~/temp/wc/c$ ~/svn_trunk_4711/subversion/svn/svn log
--use-merge-history --search 'merge a'
------------------------------------------------------------------------
r5 | dsahlberg | 2021-08-09 02:31:34 +0200 (Mon, 09 Aug 2021) | 1 line
Merged via: r6

merge a to b
------------------------------------------------------------------------
dsahlberg@daniel-2020:~/temp/wc/c$ ~/svn_trunk_4711/subversion/svn/svn log
--use-merge-history --search 'merge a' --xml
<?xml version="1.0" encoding="UTF-8"?>
<log>
    <logentry revision="6">
        <logentry reverse-merge="false" revision="5">
            <author>dsahlberg</author>
            <date>2021-08-09T00:31:34.969440Z</date>
            <msg>merge a to b</msg>
        </logentry>
    </logentry>
</log>
]]]

In plain text output the inheritance is show by the "Merged via:" line, in
XML is found by examining the tree.

I checked site/tools/upcoming.py. It doesn't use "-g" (which is responsible
for generating the recursive tree) so it expects all logentry elements to
be children of <log>, ie, it would not consider revision 5 above. Any tool
that is using "-g" should already be able to traverse a tree with logentry
elements that are children of other logentry elements (which could be
children of other logentry elements which ...).


> The most simple solution is to add the opening tag (<logentry
> revision="n">) whenever a log entry has children:
> [[[
>
> Index: subversion/svn/log-cmd.c
> ===================================================================
> --- subversion/svn/log-cmd.c    (revision 1892053)
> +++ subversion/svn/log-cmd.c    (working copy)
> @@ -552,6 +552,7 @@
>         return SVN_NO_ERROR;
>       }
>
> +  revstr = apr_psprintf(pool, "%ld", log_entry->revision);
>     /* Match search pattern before XML-escaping. */
>     if (lb->search_patterns &&
>         ! match_search_patterns(lb->search_patterns, author, date, message,
> @@ -559,6 +560,9 @@
>       {
>         if (log_entry->has_children)
>           {
> +          svn_xml_make_open_tag(&sb, pool, svn_xml_normal, "logentry",
> +                              "revision", revstr, SVN_VA_NULL);
> +          SVN_ERR(svn_cl__error_checked_fputs(sb->data, stdout));
>             if (! lb->merge_stack)
>               lb->merge_stack = apr_array_make(lb->pool, 1,
> sizeof(svn_revnum_t));
>
> @@ -575,7 +579,6 @@
>     if (message)
>       message = svn_xml_fuzzy_escape(message, pool);
>
> -  revstr = apr_psprintf(pool, "%ld", log_entry->revision);
>     /* <logentry revision="xxx"> */
>     if (lb->merge_stack && lb->merge_stack->nelts > 0)
>       {
>
> ]]]
>
> The problem with this solution is that it might emit empty XML elements,
> like this:
>
> [[[
>
>      <logentry revision="6">
>      </logentry>
> ]]]
>
> Depending on how the XML parser is built and how the information is
> used/displayed this may or may not be a problem. As far as I understand,
> most (if not all) tags are optional so it should not be an issue that
> there are no children (for example <msg></msg>). But it may be that an
> empty revision 6 is displayed in this case.
>
> A more complex solution would require keeping track of the XML tags that
> should be emitted if a child matches the search pattern. There is
> already an array (lb->merge_stack) that could probably be used for this
> purpose. Instead of just storing the revision number it would have to
> store if the opening tag has been emitted or not. When a child matches
> the search pattern, all parent nodes that has not previously been
> emitted will be emitted. When the library signals to close a tag, the
> code checks if the corresponding open tag has been emitted or not.
>
> The following is a concept for such a patch (I know HACKING has some
> things to say about how to name typedefs etc, but I don't have time to
> study it now, there are also whitespace issues). If this is a more
> desirable solution (ie, I get some +1 (concept)) then I will cleanup the
> patch and repost.
>
> [[[
>
> Index: subversion/svn/log-cmd.c
> ===================================================================
> --- subversion/svn/log-cmd.c    (revision 1892053)
> +++ subversion/svn/log-cmd.c    (working copy)
> @@ -45,6 +45,11 @@
>
>   #include "svn_private_config.h"
>
> +typedef struct merge_stack {
> +  svn_revnum_t rev;
> +  svn_boolean_t emitted;
> +} merge_stack_t;
> +
>
>   /*** Code. ***/
>
> @@ -355,10 +360,14 @@
>       {
>         if (log_entry->has_children)
>           {
> +      merge_stack_t ms;
> +
>             if (! lb->merge_stack)
> -            lb->merge_stack = apr_array_make(lb->pool, 1,
> sizeof(svn_revnum_t));
> +            lb->merge_stack = apr_array_make(lb->pool, 1,
> sizeof(merge_stack_t));
>
> -          APR_ARRAY_PUSH(lb->merge_stack, svn_revnum_t) =
> log_entry->revision;
> +      ms.rev = log_entry->revision;
> +      ms.emitted = FALSE;
> +          APR_ARRAY_PUSH(lb->merge_stack, merge_stack_t) = ms;
>           }
>
>         return SVN_NO_ERROR;
> @@ -435,10 +444,10 @@
>         iterpool = svn_pool_create(pool);
>         for (i = 0; i < lb->merge_stack->nelts; i++)
>           {
> -          svn_revnum_t rev = APR_ARRAY_IDX(lb->merge_stack, i,
> svn_revnum_t);
> +          merge_stack_t ms = APR_ARRAY_IDX(lb->merge_stack, i,
> merge_stack_t);
>
>             svn_pool_clear(iterpool);
> -          SVN_ERR(svn_cmdline_printf(iterpool, " r%ld%c", rev,
> +          SVN_ERR(svn_cmdline_printf(iterpool, " r%ld%c", ms.rev,
>                                        i == lb->merge_stack->nelts - 1 ?
> '\n' : ','));
>           }
> @@ -475,10 +484,14 @@
>
>     if (log_entry->has_children)
>       {
> +      merge_stack_t ms;
> +
>         if (! lb->merge_stack)
> -        lb->merge_stack = apr_array_make(lb->pool, 1,
> sizeof(svn_revnum_t));
> +        lb->merge_stack = apr_array_make(lb->pool, 1,
> sizeof(merge_stack_t));
>
> -      APR_ARRAY_PUSH(lb->merge_stack, svn_revnum_t) = log_entry->revision;
> +      ms.rev = log_entry->revision;
> +      ms.emitted = FALSE;
> +      APR_ARRAY_PUSH(lb->merge_stack, merge_stack_t) = ms;
>       }
>
>     return SVN_NO_ERROR;
> @@ -533,6 +546,7 @@
>     const char *author;
>     const char *date;
>     const char *message;
> +  int i;
>
>     if (lb->ctx->cancel_func)
> SVN_ERR(lb->ctx->cancel_func(lb->ctx->cancel_baton));
> @@ -544,11 +558,14 @@
>
>     if (! SVN_IS_VALID_REVNUM(log_entry->revision))
>       {
> -      svn_xml_make_close_tag(&sb, pool, "logentry");
> -      SVN_ERR(svn_cl__error_checked_fputs(sb->data, stdout));
> -      if (lb->merge_stack)
> -        apr_array_pop(lb->merge_stack);
> -
> +      if (lb->merge_stack) {
> +    if (lb->merge_stack->nelts > 0 && (APR_ARRAY_IDX(lb->merge_stack,
> lb->merge_stack->nelts-1, merge_stack_t).emitted))
> +      {
> +        svn_xml_make_close_tag(&sb, pool, "logentry");
> +        SVN_ERR(svn_cl__error_checked_fputs(sb->data, stdout));
> +      }
> +    apr_array_pop(lb->merge_stack);
> +      }
>         return SVN_NO_ERROR;
>       }
>
> @@ -559,15 +576,32 @@
>       {
>         if (log_entry->has_children)
>           {
> +      merge_stack_t ms;
>             if (! lb->merge_stack)
> -            lb->merge_stack = apr_array_make(lb->pool, 1,
> sizeof(svn_revnum_t));
> +            lb->merge_stack = apr_array_make(lb->pool, 1,
> sizeof(merge_stack_t));
>
> -          APR_ARRAY_PUSH(lb->merge_stack, svn_revnum_t) =
> log_entry->revision;
> +      ms.rev = log_entry->revision;
> +      ms.emitted = FALSE;
> +          APR_ARRAY_PUSH(lb->merge_stack, merge_stack_t) = ms;
>           }
>
>         return SVN_NO_ERROR;
>       }
> -
> +  else if (lb->search_patterns && lb->merge_stack) {
> +    /* match_search_patterns returned true */
> +    for (i = 0; i < lb->merge_stack->nelts; i++)
> +      {
> +    if (!APR_ARRAY_IDX(lb->merge_stack, i, merge_stack_t).emitted)
> +      {
> +        revstr = apr_psprintf(pool, "%ld",
> +                  APR_ARRAY_IDX(lb->merge_stack, i, merge_stack_t).rev);
> +        svn_xml_make_open_tag(&sb, pool, svn_xml_normal, "logentry",
> +                  "revision", revstr, SVN_VA_NULL);
> +        APR_ARRAY_IDX(lb->merge_stack, i, merge_stack_t).emitted = TRUE;
> +      }
> +      }
> +  }
> +
>     if (author)
>       author = svn_xml_fuzzy_escape(author, pool);
>     if (date)
> @@ -606,7 +640,6 @@
>     if (log_entry->changed_paths2)
>       {
>         apr_array_header_t *sorted_paths;
> -      int i;
>
>         /* <paths> */
>         svn_xml_make_open_tag(&sb, pool, svn_xml_normal, "paths",
> @@ -684,10 +717,14 @@
>
>     if (log_entry->has_children)
>       {
> +      merge_stack_t ms;
> +
>         if (! lb->merge_stack)
> -        lb->merge_stack = apr_array_make(lb->pool, 1,
> sizeof(svn_revnum_t));
> +        lb->merge_stack = apr_array_make(lb->pool, 1,
> sizeof(merge_stack_t));
>
> -      APR_ARRAY_PUSH(lb->merge_stack, svn_revnum_t) = log_entry->revision;
> +      ms.rev = log_entry->revision;
> +      ms.emitted = TRUE;
> +      APR_ARRAY_PUSH(lb->merge_stack, merge_stack_t) = ms;
>       }
>     else
>       svn_xml_make_close_tag(&sb, pool, "logentry");
>
> ]]]
>
> I have run the testsuite with both patch versions without any problems.
> (I have obviously removed XFail() from the test for #4711 in log_tests.py).
>

danielsh: 2. Wondering about a new svn_stream_t type that wraps another
stream, caches writes until svn_stream_flush(), and has an
undo_last_write() function
danielsh: A 2-tuple struct may well be better :-)

There is still the need to store "state" somewhere to know if we should
flush() or undo(). Pseudo call-sequence (each line starting with - is a
call to the receiver), I'm using the same WC as above with an additional
revision 7. The calls with r-1 is the invalid revision number used to
generate the </logentry> closing tag after evaluating the merge history
(the -g).

- receiver r7
  ... match_search_patterns succeed. We want this revision. Output
contents. No children so output the closing tag immediately. Flush().
- receiver r6
  ... match_search_patterns fail, so we don't want this revision. It has
children so output anyway (but possibly ignoring <msg> etc.).
- receiver r5 (because of -g)
  ... match_search_patterns succeed, so we want this revision. Output
contents. No children so output the closing tag immediately. Flush().
- receiver -1 (this was the end of the merge history)
  ... Output closing tag. Now we need to decide if to Flush() or Undo()
depending on if anything was emitted since r6. (The contents of r5 has
already been flush()ed but we also need to flush the </logentry> tag).
The end [riding off into the sunset].

The only way I can come up with is to store this information in the
merge_stack.

That being said, I'm not opposed to use an undo()-able stream but I don't
think it is enough by itself.

Kind regards,
Daniel Sahlberg