You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Fuhrmann <st...@wandisco.com> on 2015/07/10 01:23:46 UTC

Re: inconsistency between mergeinfo records

On Thu, Jun 25, 2015 at 3:29 PM, Stefan Hett <st...@egosoft.com> wrote:

>  Hi,
>
> as promised, answering the remaining questions now:
>

Hi Stefan,

First of all, thank you for the detailed feedback! It is very helpful.

I spent the last two weeks refactoring and reworking the tool. The main
changes:
* explicit --verbose mode, much quieter without it
* progress output
* only one common 'normalize' sub-command; actions selected by options
* 'analyze' and the new 'remove-branches' sub-command use the same code
  as 'normalize' and should therefore be consistent
* faster processing with large number of branches and / or high latency
networks

I must also admit that the old 'normalize' command has a flaw that would
result
in the removal of sub-tree mergeinfo that was NOT redundant. The 'analyze'
output was correct, though, and the problem only manifested when sub-tree
mergeinfo could be completely removed. To check whether you have been
affected,
do the following:

* c/o the revision before any m/i changes were committed.
* run the latest tool 'normalize --remove-redundant --remove-obsoletes'
* run 'svn pg svn:mergeinfo -R /path/to/working/copy --xml | grep "path= " '
  to get a list of nodes that still have mergeinfo on them
* run the same 'svn pg ...' command on the committed changes produced by
  the old tool
* compare the output, looking for m/i that only the old tool removed
* if need be, manually fix them

  [...]
>
>
>>  If you have any time requirements/considerations on your side which
>> would require/benefit from earlier feedback, pls let me know.
>>
>
>  Right now, we are all working towards the 1.9 RC. Feedback
>  in May or June would be nice.
>
>  The key question that I like to see answered is "Does the
>  tool do something useful?" For instance, it might become
>  ineffective in complex setups, we might need to add detection
>  of "mismatched" branches etc. We might also end up with
>  mergeinfo that is technically smaller but neither faster to
> process nor easier to understand.
>
> Overall I think this is a really great tool and is really valuable to
> administrators who have been running larger instances over a longer period
> of time.
>
> Initially the output of the analysis-log is kinda bloated. In my initial
> run the output produces a 2MB log-file. After reducing the amount of
> mergeinfo records (using normalization and dropping merginfos from obsolete
> branches) the output is quite good/reasonable. Some kind of documentation
> explaining the different output statements mean and what the admin/user
> could do about it would be helpful though I think.
>

The commands have comprehensive documention now.
The "what to do about it" part is yet to be addressed.


> Also it'd be good to add a more automated "one-step" command to simplify
> the usage even further. So a user/admin could simply start the tool (for
> instance svn-mergeinfo-normalizer clean-up-mergeinfo [path]
> -drop-obsolete-branches) which would more or less equal running the tool
> several times in the following sequence:
> svn-mergeinfo-normalizer.exe clear-obsoletes [path]
> svn-mergeinfo-normalizer.exe normalize [path]
> svn-mergeinfo-normalizer.exe combine-ranges [path]
> svn-mergeinfo-normalizer.exe analyse [path] -stats
>
> (where I'd envision the -stats param for the analyse command would print
> out a summary of how many remaining mergeinfos could not be normalized (if
> any) and pointing the user to run the full analysis step to get a more
> detailed output).
>

Try the new command structure and options. Is that roughly what you had in
mind?


> For the long term I hope that the functionality provided by this tool
> would become obsolete and the issues for which you have to use this tool
> are dealt with directly in the SVN core so these would not surface at all
> anymore (aka: no need to normalize mergeinfos manually).
>

Newer releases of SVN try to elide sub-tree mergeinfo as they go.
However, they can't be as thourough as this tool (for performance
reasons) and will not "fix" old mergeinfo. The one thing that it will
probably never do is remove mergeinfo for deleted branches because
that is a potentially destructive operation and only o.k. if you never
want to merge from those deleted branches again (99.9% of users).

A completely rewritten branching and mergeing logic may solve
the problem on a fundamental level.

>  So, there are the things that I'd love to get some feedback on:
>
>  * Does the tool work at all (no crashes, nothing obviously stupid)?
>
> I experienced no crashes and the output was quite clear to me (after
> facing the initial quite bloated analysis output ).
>
>  * Is the result of each reduction stage correct (as far as one can tell)?
>
> Already pointed out a few cases in my other replies. Will start a new
> thread to keep this with the further remaining cases I think I found.
>
>  * Is the tool feedback intelligible? How could that be improved?
>
> As suggested above some means to get a more statistical output especially
> for the initial run might be helpful. The header information atm is already
> a good start, but maybe adding/cleaning-up the output a bit further to
> produce maybe some statistic log would be more useful for the first run.
>
> For instance atm the analysis-output reports the actual non-existing
> branches for each path the tool checks-out. In my case that's around 100
> branches for each of the 400 paths... -> over 40.000 lines of branch info.
> More useful would be a list at the top with branches being obsolete (it's
> implicit that all subdirectories into the branch is obsolete if the parent
> path is non-existand).
>
> With the added reporting of obsolete branches this is even worse now.
>

With the latest changes, 'analyze' will only show "offending" branches and
their
details by default. In --verbose mode, all branches are listed, but only
once per
node (plus a summary of remaining branches).

Also, there is now a summary listing of all deleted branches that were
encountered.


> The other thing might be to add some stat-output to normalize /
> combine-ranges / clear-obsoletes to report how many mergeinfo entries could
> be normalized, or how many obsolete paths were removed.
> Since the commands can take a few minutes to run, some kind of "progress
> output" might also be useful, so the user knows the process did not
> deadlock or ran into an endless loop.
>

There is progress info now while the log gets downloaded and for the
'normalize' command processing when not in --verbose mode.

>  * How effective is each stage / mergeinfo reduction command?
>  * How often does it completely elide sub-tree mergeinfo?
>  * What typical scenarios prevented sub-tree mergeinfo elision?
>
> I guess this was already answered by sending you the log files.
>

Yup. In particular, combining ranges was more effective than expected.

>  Up to here, you don't need to commit anything. If you are
>  convinced that the tool works correctly, you may commit
>  the results into some toy copy of your repository. Then the
>  following would be interesting:
>
>  * Are merges based on the reduced mergeinfo faster?
> * Do merges based on the reduced mergeinfo use less memory?
>  * Any anomalies?
>
>   I didn't spot any anomalies so far. With regards on performance and
> memory consumptions I can't provide any numbers. One common use-case which
> is now significantly faster though is to merge changes from one to the
> other branch, since it now only contains a few nodes with mergeinfos while
> before it had to commit up to 400 nodes changes... So this to us is a
> really significant improvement.
>

I think the tool will be shipped with 1.10. The only problematic part is
that
many vendors don't ship the tools but only core binaries. Maybe, it gets
merged into another tool.

-- Stefan^2.

Re: inconsistency between mergeinfo records

Posted by Stefan Hett <st...@egosoft.com>.
Hi Stefan^2,
>
> Hi Stefan,
>
> First of all, thank you for the detailed feedback! It is very helpful.
>
> I spent the last two weeks refactoring and reworking the tool. The 
> main changes:
> * explicit --verbose mode, much quieter without it
> * progress output
> * only one common 'normalize' sub-command; actions selected by options
> * 'analyze' and the new 'remove-branches' sub-command use the same code
>   as 'normalize' and should therefore be consistent
> * faster processing with large number of branches and / or high 
> latency networks
I was following ur commits on the mailing list and just tried out the 
latest version. Great work there. The changes done make this tool so 
much more usable compared to the old version. I also sent you the 
regenerated logs (based on the same branch/rev I used for the old tool) 
just in case comparing the different outputs is useful to you.

One small note: While I do understand the reasoning for the different 
default switches for analyze/normalize, initially I was surprised a bit 
that while normalize only ran with --remove-redundant, analyze created 
the output for --remove-redundant/--combine-ranges/--remove-obsoletes.
I would have expected that both commands consistently use the 
--remove-redundant option only, unless specified.

> I must also admit that the old 'normalize' command has a flaw that 
> would result
> in the removal of sub-tree mergeinfo that was NOT redundant. The 'analyze'
> output was correct, though, and the problem only manifested when sub-tree
> mergeinfo could be completely removed. To check whether you have been 
> affected,
> do the following:
>
> * c/o the revision before any m/i changes were committed.
> * run the latest tool 'normalize --remove-redundant --remove-obsoletes'
> * run 'svn pg svn:mergeinfo -R /path/to/working/copy --xml | grep 
> "path= " '
>   to get a list of nodes that still have mergeinfo on them
> * run the same 'svn pg ...' command on the committed changes produced by
>   the old tool
> * compare the output, looking for m/i that only the old tool removed
> * if need be, manually fix them
Thanks for letting me know. I ran the tests and verified I didn't run 
into this issue here.

>     Also it'd be good to add a more automated "one-step" command to
>     simplify the usage even further. So a user/admin could simply
>     start the tool (for instance svn-mergeinfo-normalizer
>     clean-up-mergeinfo [path] -drop-obsolete-branches) which would
>     more or less equal running the tool several times in the following
>     sequence:
>     svn-mergeinfo-normalizer.exe clear-obsoletes [path]
>     svn-mergeinfo-normalizer.exe normalize [path]
>     svn-mergeinfo-normalizer.exe combine-ranges [path]
>     svn-mergeinfo-normalizer.exe analyse [path] -stats
>
>     (where I'd envision the -stats param for the analyse command would
>     print out a summary of how many remaining mergeinfos could not be
>     normalized (if any) and pointing the user to run the full analysis
>     step to get a more detailed output).
>
>
> Try the new command structure and options. Is that roughly what you 
> had in mind?
Absolutely. Feels right to me (with the addition of the note mentioned 
above).

>     For the long term I hope that the functionality provided by this
>     tool would become obsolete and the issues for which you have to
>     use this tool are dealt with directly in the SVN core so these
>     would not surface at all anymore (aka: no need to normalize
>     mergeinfos manually).
>
> Newer releases of SVN try to elide sub-tree mergeinfo as they go.
> However, they can't be as thourough as this tool (for performance
> reasons) and will not "fix" old mergeinfo. The one thing that it will
> probably never do is remove mergeinfo for deleted branches because
> that is a potentially destructive operation and only o.k. if you never
> want to merge from those deleted branches again (99.9% of users).
Fully agreed with the dropping of branch-mergeinfos (in the current 
merge(info) design of SVN).
Maybe an interim solution which might be worth considering is 
integrating the other part of the normalizer's tool functionality into 
the SVN-cleanup-command to some extend (or integrate it into the 
svn-command somehow).
At least that way it would be easier accessible to a broader audience 
and also be more likely be integrated in 3rd-party tools (like TSVN, etc.).

>     The other thing might be to add some stat-output to normalize /
>     combine-ranges / clear-obsoletes to report how many mergeinfo
>     entries could be normalized, or how many obsolete paths were removed.
>     Since the commands can take a few minutes to run, some kind of
>     "progress output" might also be useful, so the user knows the
>     process did not deadlock or ran into an endless loop.
>
>
> There is progress info now while the log gets downloaded and for the
> 'normalize' command processing when not in --verbose mode.
I also see the processing info when running normalize -v. Honestly I 
find this quite useful, but if u intended it to be not that way, then I 
guess I've to report it's bugged here (in this case voting for keeping 
the bug though :) ).

Btw. Your idea to add a command to specify which branches are to be 
dropped when running the remove-obsoletes command: This is actually 
something I'd also use once it's available. :-) So looking forward to it 
being added.

Once more: thanks all your work.

Regards,
Stefan