You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Kristofer <gu...@protonmail.com> on 2021/12/09 15:44:10 UTC

svndumpfilter: finding source commit for error?

Hi,

I'm doing some work on an old, bloated repo to get rid of some erroneous commits to reduce the disk size. I've done this a few years ago with success, but I'm currently getting into an error situation that I don't know how to handle.

I'm running something like
"svadmin dump | svndumpfilter <lots of exclude patterns and an exclude file>"

This works fine until I get to
Committing revision 100000 as 100000
svndumpfilter: E200003: Missing merge source path 'project/branches/somebranch/idioticdir-i-will-filter'; try with --skip-missing-merge-sources

I do understand what the message means, but I cannot for the life of me figure out which commit it is that is causing the problem. I do not want to skip the merge source, I want to add the place it has been copied to into the filterfile.

Revision 100000 has nothing to do with that branch and neither has 100001, so I'm guessing a that is a stderr/stdout type of mismatch in the printouts and the error happens later than 100000. I first tried to see that nothing is done on that specific branch that I am looking at and nope, that branch is "finished" at this point.
Then I have tried to look 200 revisions into the future to see if I can find a copy operation (basically do diff --summarize and grep for "idioticdir-i-will-filter") and I can't find it there either).
Tried playing around with svnlook and I fail there as well, but I'm not really used to this one and might be missing some nice option. I am running out of ideas how to find the new place it is copied to and add it to my filterfile.
Note: I cannot have a globbing filter on the name of the directory since that appears as a valid part of the project in many other places.

Any ideas why it is not printing both source and target of the copy, which is what I have seen other times with "copy source error"?
Any ideas on how to find the commit that is eluding me will be greatly appreciated.

Note: using svn 1.14

Re: svndumpfilter: finding source commit for error?

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Dec 09, 2021 at 01:04:24PM -0600, Luke Mauldin wrote:
> Did this problematic commit occur due to a bug at the time in Subversion or a user error?
> 
> Luke

The log message suggests it was a deliberate choice, and there was
some discussion about it after the fact.
See https://svn.apache.org/r1295006 and https://marc.info/?t=139866097800008&r=1&w=2

Anyway, since this code lives in the ASF repository there was nothing
that could be done to undo the copy without impacting every ASF project.

Re: svndumpfilter: finding source commit for error?

Posted by Luke Mauldin <lu...@icloud.com>.
Did this problematic commit occur due to a bug at the time in Subversion or a user error?

Luke

> On Dec 9, 2021, at 12:47 PM, Stefan Sperling <st...@elego.de> wrote:
> 
> On Thu, Dec 09, 2021 at 06:29:14PM +0000, Kristofer wrote:
>> Hi Stefan and thanks for the hints. Then I need to line up a lot of
>> arguments, right? There's no "read from file" option that I can see. I'll try
>> that on the next failure, if I get one *fingers crossed*
> 
> Indeed, the list of path prefixes must be passed on the command line.
> Support for reading them from a file is not implemented, unfortunately.
> 
>> Btw, I also have this really silly commit sequence where someone managed to
>> delete the entire branches/ directory followed by a commit that brings it
>> back (not sure if the commiter used a proper reverse-merge or a copy). I
>> haven't understood if that can be "fixed" with svndumpfilter or if there's
>> some other way to do it. Those two are basically a null operation, but it
>> messes with things like "log --stop-on-copy"
> 
> I don't think there is an easy way to fix that via dump/load once other
> commits have been stacked on top. Any newer commit might refer to data
> stored in the problematic commit, due to deltification and other meta-data
> relationships between revisions. Copies in particular refer to node-rev-IDs
> which are generally hidden from the user, but which can be seen in the dump
> file, and which are not supposed to be changed.
> 
> There is a commit in Subversion's own trunk history which unfortunately
> did exactly the same thing. But it is water under the bridge at this point.
> People rarely have a need to go far enough back in history to cross it.

Fw: svndumpfilter: finding source commit for error?

Posted by Kristofer <gu...@protonmail.com>.
Hi again,
I tried what Stefan suggested below and used "svnadmin dump --exclude <bad path>" instead of piping it to svndumpfilter and I was tearing my hair out with failure until I realized that there's an undocumented difference here (at least I can't find it in the help text for them).
For --targets file in svndumpfilter, I put the node as "rootDir/badDir", so I did "svnadmin dump --exclude rootDir/badDir" which didn't get rid of any of my problems. After some tears and bruteforcing it, I figured out you need the initial slash there, so "svnadmin dump --exclude /rootDir/badDir" does the trick.
Just in case anyone else runs into this, remember the initial slash

On Thursday, December 9th, 2021 at 7:47 PM, Stefan Sperling <st...@elego.de> wrote:
> On Thu, Dec 09, 2021 at 06:29:14PM +0000, Kristofer wrote:
>
> > Hi Stefan and thanks for the hints. Then I need to line up a lot of
> >
> > arguments, right? There's no "read from file" option that I can see. I'll try
> >
> > that on the next failure, if I get one fingers crossed
>
> Indeed, the list of path prefixes must be passed on the command line.
>
> Support for reading them from a file is not implemented, unfortunately.
>
> > Btw, I also have this really silly commit sequence where someone managed to
> >
> > delete the entire branches/ directory followed by a commit that brings it
> >
> > back (not sure if the commiter used a proper reverse-merge or a copy). I
> >
> > haven't understood if that can be "fixed" with svndumpfilter or if there's
> >
> > some other way to do it. Those two are basically a null operation, but it
> >
> > messes with things like "log --stop-on-copy"
>
> I don't think there is an easy way to fix that via dump/load once other
>
> commits have been stacked on top. Any newer commit might refer to data
>
> stored in the problematic commit, due to deltification and other meta-data
>
> relationships between revisions. Copies in particular refer to node-rev-IDs
>
> which are generally hidden from the user, but which can be seen in the dump
>
> file, and which are not supposed to be changed.
>
> There is a commit in Subversion's own trunk history which unfortunately
>
> did exactly the same thing. But it is water under the bridge at this point.
>
> People rarely have a need to go far enough back in history to cross it.

Re: svndumpfilter: finding source commit for error?

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Dec 09, 2021 at 06:29:14PM +0000, Kristofer wrote:
> Hi Stefan and thanks for the hints. Then I need to line up a lot of
> arguments, right? There's no "read from file" option that I can see. I'll try
> that on the next failure, if I get one *fingers crossed*

Indeed, the list of path prefixes must be passed on the command line.
Support for reading them from a file is not implemented, unfortunately.

> Btw, I also have this really silly commit sequence where someone managed to
> delete the entire branches/ directory followed by a commit that brings it
> back (not sure if the commiter used a proper reverse-merge or a copy). I
> haven't understood if that can be "fixed" with svndumpfilter or if there's
> some other way to do it. Those two are basically a null operation, but it
> messes with things like "log --stop-on-copy"

I don't think there is an easy way to fix that via dump/load once other
commits have been stacked on top. Any newer commit might refer to data
stored in the problematic commit, due to deltification and other meta-data
relationships between revisions. Copies in particular refer to node-rev-IDs
which are generally hidden from the user, but which can be seen in the dump
file, and which are not supposed to be changed.

There is a commit in Subversion's own trunk history which unfortunately
did exactly the same thing. But it is water under the bridge at this point.
People rarely have a need to go far enough back in history to cross it.

Re: svndumpfilter: finding source commit for error?

Posted by Kristofer <gu...@protonmail.com>.
Hi Stefan and thanks for the hints. Then I need to line up a lot of arguments, right? There's no "read from file" option that I can see. I'll try that on the next failure, if I get one *fingers crossed*

Btw, I also have this really silly commit sequence where someone managed to delete the entire branches/ directory followed by a commit that brings it back (not sure if the commiter used a proper reverse-merge or a copy). I haven't understood if that can be "fixed" with svndumpfilter or if there's some other way to do it. Those two are basically a null operation, but it messes with things like "log --stop-on-copy"

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, December 9th, 2021 at 7:08 PM, Stefan Sperling <st...@elego.de> wrote:

> On Thu, Dec 09, 2021 at 05:07:09PM +0000, Kristofer wrote:
>
> > And of course, soon after mailing, I found a workaround. I am not sure why it worked but anyways:
> >
> > The problem was caused by the branch being merged to trunk, but the contents of the "bad directory" was never merged. Then I thought it would be fine to filter out the bad directory itself, especially since there were no subtree mergeinfo on it.
> >
> > But it turned out that I could filter "baddir/badcontent1" and "baddir/badcontent2" and then the dumpfilter operation worked fine. I still don't know which revision the actual problem showed up, but it doesn't matter at this point :)
> >
> > I suppose there is something with the mergeinfo that I do not understand, but since I managed to get around it, no need to bother this list further. Sorry about the noise.
>
> Nowadays we also support --include and --exclude options in "svnadmin dump"
>
> itself which avoids having to pass the dump stream through svndumpfilter.
>
> I don't know if this would have prevented your mergeinfo-related problem.
>
> But I wanted to mention this in case you would like to try it out.
>
> Or just keep it in mind in case you run into such a problem again.
>
> In general, the built-in --include and --exclude options can be smarter
>
> because they have access to context information while the dump stream
>
> is being created. Whereas svndumpfilter can only obtain information from
>
> the generated dump stream.
>
> $ svnadmin help dump
>
> [...]
>
> Using --exclude or --include gives results equivalent to authz-based
>
> path exclusions. In particular, when the source of a copy is
>
> excluded, the copy is transformed into an add (unlike in 'svndumpfilter').
>
> Valid options:
>
> [...]
>
> --exclude ARG : filter out nodes with given prefix(es) from dump
>
> --include ARG : filter out nodes without given prefix(es) from dump
>
> [...]

Re: svndumpfilter: finding source commit for error?

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Dec 09, 2021 at 05:07:09PM +0000, Kristofer wrote:
> And of course, soon after mailing, I found a workaround. I am not sure why it worked but anyways:
> The problem was caused by the branch being merged to trunk, but the contents of the "bad directory" was never merged. Then I thought it would be fine to filter out the bad directory itself, especially since there were no subtree mergeinfo on it.
> But it turned out that I could filter "baddir/badcontent1" and "baddir/badcontent2" and then the dumpfilter operation worked fine. I still don't know which revision the actual problem showed up, but it doesn't matter at this point :)
> I suppose there is something with the mergeinfo that I do not understand, but since I managed to get around it, no need to bother this list further. Sorry about the noise.
> 

Nowadays we also support --include and --exclude options in "svnadmin dump"
itself which avoids having to pass the dump stream through svndumpfilter.

I don't know if this would have prevented your mergeinfo-related problem.
But I wanted to mention this in case you would like to try it out.
Or just keep it in mind in case you run into such a problem again.

In general, the built-in --include and --exclude options can be smarter
because they have access to context information while the dump stream
is being created. Whereas svndumpfilter can only obtain information from
the generated dump stream.

$ svnadmin help dump
[...]
Using --exclude or --include gives results equivalent to authz-based
path exclusions. In particular, when the source of a copy is
excluded, the copy is transformed into an add (unlike in 'svndumpfilter').

Valid options:
[...]
  --exclude ARG            : filter out nodes with given prefix(es) from dump
  --include ARG            : filter out nodes without given prefix(es) from dump
[...]

Re: svndumpfilter: finding source commit for error?

Posted by Kristofer <gu...@protonmail.com>.
And of course, soon after mailing, I found a workaround. I am not sure why it worked but anyways:
The problem was caused by the branch being merged to trunk, but the contents of the "bad directory" was never merged. Then I thought it would be fine to filter out the bad directory itself, especially since there were no subtree mergeinfo on it.
But it turned out that I could filter "baddir/badcontent1" and "baddir/badcontent2" and then the dumpfilter operation worked fine. I still don't know which revision the actual problem showed up, but it doesn't matter at this point :)
I suppose there is something with the mergeinfo that I do not understand, but since I managed to get around it, no need to bother this list further. Sorry about the noise.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, December 9th, 2021 at 4:44 PM, Kristofer <gu...@protonmail.com> wrote:

> Hi,
>
> I'm doing some work on an old, bloated repo to get rid of some erroneous commits to reduce the disk size. I've done this a few years ago with success, but I'm currently getting into an error situation that I don't know how to handle.
>
> I'm running something like
> "svadmin dump | svndumpfilter <lots of exclude patterns and an exclude file>"
>
> This works fine until I get to
> Committing revision 100000 as 100000
> svndumpfilter: E200003: Missing merge source path 'project/branches/somebranch/idioticdir-i-will-filter'; try with --skip-missing-merge-sources
>
> I do understand what the message means, but I cannot for the life of me figure out which commit it is that is causing the problem. I do not want to skip the merge source, I want to add the place it has been copied to into the filterfile.
>
> Revision 100000 has nothing to do with that branch and neither has 100001, so I'm guessing a that is a stderr/stdout type of mismatch in the printouts and the error happens later than 100000. I first tried to see that nothing is done on that specific branch that I am looking at and nope, that branch is "finished" at this point.
> Then I have tried to look 200 revisions into the future to see if I can find a copy operation (basically do diff --summarize and grep for "idioticdir-i-will-filter") and I can't find it there either).
> Tried playing around with svnlook and I fail there as well, but I'm not really used to this one and might be missing some nice option. I am running out of ideas how to find the new place it is copied to and add it to my filterfile.
> Note: I cannot have a globbing filter on the name of the directory since that appears as a valid part of the project in many other places.
>
> Any ideas why it is not printing both source and target of the copy, which is what I have seen other times with "copy source error"?
> Any ideas on how to find the commit that is eluding me will be greatly appreciated.
>
> Note: using svn 1.14