You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Alfred von Campe <al...@von-campe.com> on 2018/06/06 19:12:20 UTC

Problem with svndumpfilter

I’m trying to remove two sensitive directories from a repo so we can have a 3rd party work on it.  I first dumped the entire repo, and now I’m trying to remove two directories from one particular branch.  But svndumpfilter keeps failing as follows:

$ svndumpfilter exclude branches/develop/dir1 branches/develop/dir2  < repo.dump > repo-nodir12.dump
svndumpfilter: E200003: Invalid copy source path '/branches/develop/dir2'

I’ve tried this both from a full incremental dump of the repo as well as a non-incremental dump of the repo starting from the revision that branches/develop was created.  It always fails after the exact same revision.

Is there anything I can do to work around this issue?

Alfred


Re: Problem with svndumpfilter

Posted by Branko Čibej <br...@apache.org>.
On 08.06.2018 10:20, Doros Agathangelou wrote:
> Another option is to use Subdivision, a commercial tool that is
> designed to do exactly these kinds of operations, in other works
> Delete files, Extract files, Split repositories in two parts.
> Subdivision reads the repository structure in memory and 'understands'
> the structure of your repository. It can therefore create the
> necessary derived selections from your selections to make sure that
> the delete operation succeeds in the first pass. In this example,
> Subdivision would make sure that /branches/develop/dir2 would be
> available for the copy to be made from it (or that both source and
> copy would both be unavailable - depending on the user selections).
>
> Subdivision is a Windows application but we are working on a Linux
> solution too. You can try a demo by visiting http://subdi.vision
>
> Goes without saying that we are affiliated with Subdivision and we are
> sorry for the shameless advertising but we believe people in this
> group will benefit from knowing that Subdivision exists.


Before advertising proprietary, platform-specific tools, I'd first
suggest that the OP take a look at

    https://gitlab.com/esr/reposurgeon

which also supports these kinds of operations and is open source.


> On Thu, Jun 7, 2018 at 6:08 PM Stefan Sperling <st...@elego.de> wrote:
>> On Thu, Jun 07, 2018 at 10:40:51AM -0400, Alfred von Campe wrote:
>>> Thanks, Stefan.  The path of least resistance for me is to use the script you pointed me to.  However, it seems that the exclude feature is not yet implemented:
>>>
>>>   try:
>>>     if args[0] == 'include':
>>>       sys.exit(analyze_logs(map(sanitize_path, targets)))
>>>     elif args[0] == 'exclude':
>>>       usage_and_exit("Feature not implemented")
>>>     else:
>>>       usage_and_exit("Valid subcommands are 'include' and 'exclude'")
>>>
>>> Is there a more recent version of this script?
>> I am afraid not. The link I provided points to the latest version of
>> this script we have in our repository.  We would welcome patches to
>> the script. However, since the problem has been fixed in SVN 1.10's
>> version of 'svnadmin' I think it makes more sense to just use 1.10.
>>
>> You could install 1.10 binaries somewhere next to your existing SVN
>> installation and use the 1.10 svnadmin binary to create a dump file.
>> The resulting dump file will be compatible with older versions.


Re: Problem with svndumpfilter

Posted by Doros Agathangelou <nt...@gmail.com>.
Another option is to use Subdivision, a commercial tool that is
designed to do exactly these kinds of operations, in other works
Delete files, Extract files, Split repositories in two parts.
Subdivision reads the repository structure in memory and 'understands'
the structure of your repository. It can therefore create the
necessary derived selections from your selections to make sure that
the delete operation succeeds in the first pass. In this example,
Subdivision would make sure that /branches/develop/dir2 would be
available for the copy to be made from it (or that both source and
copy would both be unavailable - depending on the user selections).

Subdivision is a Windows application but we are working on a Linux
solution too. You can try a demo by visiting http://subdi.vision

Goes without saying that we are affiliated with Subdivision and we are
sorry for the shameless advertising but we believe people in this
group will benefit from knowing that Subdivision exists.



On Thu, Jun 7, 2018 at 6:08 PM Stefan Sperling <st...@elego.de> wrote:
>
> On Thu, Jun 07, 2018 at 10:40:51AM -0400, Alfred von Campe wrote:
> > Thanks, Stefan.  The path of least resistance for me is to use the script you pointed me to.  However, it seems that the exclude feature is not yet implemented:
> >
> >   try:
> >     if args[0] == 'include':
> >       sys.exit(analyze_logs(map(sanitize_path, targets)))
> >     elif args[0] == 'exclude':
> >       usage_and_exit("Feature not implemented")
> >     else:
> >       usage_and_exit("Valid subcommands are 'include' and 'exclude'")
> >
> > Is there a more recent version of this script?
>
> I am afraid not. The link I provided points to the latest version of
> this script we have in our repository.  We would welcome patches to
> the script. However, since the problem has been fixed in SVN 1.10's
> version of 'svnadmin' I think it makes more sense to just use 1.10.
>
> You could install 1.10 binaries somewhere next to your existing SVN
> installation and use the 1.10 svnadmin binary to create a dump file.
> The resulting dump file will be compatible with older versions.

Re: Problem with svndumpfilter

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Jun 07, 2018 at 10:40:51AM -0400, Alfred von Campe wrote:
> Thanks, Stefan.  The path of least resistance for me is to use the script you pointed me to.  However, it seems that the exclude feature is not yet implemented:
> 
>   try:
>     if args[0] == 'include':
>       sys.exit(analyze_logs(map(sanitize_path, targets)))
>     elif args[0] == 'exclude':
>       usage_and_exit("Feature not implemented")
>     else:
>       usage_and_exit("Valid subcommands are 'include' and 'exclude'")
> 
> Is there a more recent version of this script?

I am afraid not. The link I provided points to the latest version of
this script we have in our repository.  We would welcome patches to
the script. However, since the problem has been fixed in SVN 1.10's
version of 'svnadmin' I think it makes more sense to just use 1.10.

You could install 1.10 binaries somewhere next to your existing SVN
installation and use the 1.10 svnadmin binary to create a dump file.
The resulting dump file will be compatible with older versions.

Re: Problem with svndumpfilter

Posted by Alfred von Campe <al...@von-campe.com>.
Thanks, Stefan.  The path of least resistance for me is to use the script you pointed me to.  However, it seems that the exclude feature is not yet implemented:

  try:
    if args[0] == 'include':
      sys.exit(analyze_logs(map(sanitize_path, targets)))
    elif args[0] == 'exclude':
      usage_and_exit("Feature not implemented")
    else:
      usage_and_exit("Valid subcommands are 'include' and 'exclude'")

Is there a more recent version of this script?

Alfred


> On Jun 7, 2018, at 3:11, Stefan Sperling <st...@elego.de> wrote:
> 
> On Thu, Jun 07, 2018 at 09:04:29AM +0200, Stefan Sperling wrote:
>> On Wed, Jun 06, 2018 at 03:12:20PM -0400, Alfred von Campe wrote:
>>> I’m trying to remove two sensitive directories from a repo so we can have a 3rd party work on it.  I first dumped the entire repo, and now I’m trying to remove two directories from one particular branch.  But svndumpfilter keeps failing as follows:
>>> 
>>> $ svndumpfilter exclude branches/develop/dir1 branches/develop/dir2  < repo.dump > repo-nodir12.dump
>>> svndumpfilter: E200003: Invalid copy source path '/branches/develop/dir2'
>>> 
>>> I’ve tried this both from a full incremental dump of the repo as well as a non-incremental dump of the repo starting from the revision that branches/develop was created.  It always fails after the exact same revision.
>>> 
>>> Is there anything I can do to work around this issue?
>>> 
>>> Alfred
>> 
>> Yes, you can update to 1.10 and use svnadmin dump --exclude
>> instead of using svndumpfilter.
>> See http://subversion.apache.org/docs/release-notes/1.10.html#dump-include-exclude
>> 
>> An alternative that works with earlier releases is to set up svnsync
>> replication and configure authz access rules for the sync user which
>> forbid read access to the paths you want to exclude. svnsync will deal
>> with missing copy sources by translating copies into additions.
> 
> I forgot to mention the most immediate solution:
> Add the relevant copy sources to your argument list for 'svnadumpfilter'.
> There is a script which can help with this:
> https://svn.apache.org/repos/asf/subversion/trunk/tools/server-side/svnpredumpfilter.py


Re: Problem with svndumpfilter

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Jun 07, 2018 at 09:04:29AM +0200, Stefan Sperling wrote:
> On Wed, Jun 06, 2018 at 03:12:20PM -0400, Alfred von Campe wrote:
> > I’m trying to remove two sensitive directories from a repo so we can have a 3rd party work on it.  I first dumped the entire repo, and now I’m trying to remove two directories from one particular branch.  But svndumpfilter keeps failing as follows:
> > 
> > $ svndumpfilter exclude branches/develop/dir1 branches/develop/dir2  < repo.dump > repo-nodir12.dump
> > svndumpfilter: E200003: Invalid copy source path '/branches/develop/dir2'
> > 
> > I’ve tried this both from a full incremental dump of the repo as well as a non-incremental dump of the repo starting from the revision that branches/develop was created.  It always fails after the exact same revision.
> > 
> > Is there anything I can do to work around this issue?
> > 
> > Alfred
> 
> Yes, you can update to 1.10 and use svnadmin dump --exclude
> instead of using svndumpfilter.
> See http://subversion.apache.org/docs/release-notes/1.10.html#dump-include-exclude
> 
> An alternative that works with earlier releases is to set up svnsync
> replication and configure authz access rules for the sync user which
> forbid read access to the paths you want to exclude. svnsync will deal
> with missing copy sources by translating copies into additions.

I forgot to mention the most immediate solution:
Add the relevant copy sources to your argument list for 'svnadumpfilter'.
There is a script which can help with this:
https://svn.apache.org/repos/asf/subversion/trunk/tools/server-side/svnpredumpfilter.py

Re: Problem with svndumpfilter

Posted by Nico Kadel-Garcia <nk...@gmail.com>.
On Thu, Jun 7, 2018 at 3:04 AM Stefan Sperling <st...@elego.de> wrote:
>
> On Wed, Jun 06, 2018 at 03:12:20PM -0400, Alfred von Campe wrote:
> > I’m trying to remove two sensitive directories from a repo so we can have a 3rd party work on it.  I first dumped the entire repo, and now I’m trying to remove two directories from one particular branch.  But svndumpfilter keeps failing as follows:
> >
> > $ svndumpfilter exclude branches/develop/dir1 branches/develop/dir2  < repo.dump > repo-nodir12.dump
> > svndumpfilter: E200003: Invalid copy source path '/branches/develop/dir2'
> >
> > I’ve tried this both from a full incremental dump of the repo as well as a non-incremental dump of the repo starting from the revision that branches/develop was created.  It always fails after the exact same revision.
> >
> > Is there anything I can do to work around this issue?
> >
> > Alfred
>
> Yes, you can update to 1.10 and use svnadmin dump --exclude
> instead of using svndumpfilter.
> See http://subversion.apache.org/docs/release-notes/1.10.html#dump-include-exclude
>
> An alternative that works with earlier releases is to set up svnsync
> replication and configure authz access rules for the sync user which
> forbid read access to the paths you want to exclude. svnsync will deal
> with missing copy sources by translating copies into additions.

There is also a fairly nasty and somewhat hazardous trick I've used
effectively a few times to clean up a historically messy SVN layout.
Import it to git with git svn, trim debris branches and tags and
out-of-band content ruthlessly, use "git gc --aggressive" to flush
loose objects or branches *from the history*, then export that with
git svn into a new Subversion repository.  There are risks: git
doesn't handle keywords the same way Subversion does, for example, so
the transfer needs to be reviewed cautiously for svn:keywords and
svn:ignore and svn:eol handling. But when you've a messy Subversion
layout where people dumped oddly named branches or parts of branches
in weird locations, or embedded bulky binary files accidentally and
left copies scattered around the history, it can be an invaluable
cleanup tool. It also doesn't require access to the Subversion server
to run "svnadmin dump", and it can be updated from the current running
Subversion master.

Part of the key is the use of the "git gc --aggressive" tool to flush
history of pruned content. Yes, this flushes history, and is
considered a sin, Sin, ***SIN*** for those who consider a complete and
pristine history of the entire source tree the whole point of a source
control system. But in practice..... most branches and tags are
pointless after long enough. and it only takes a few accidental
commits of bulky binaries or of inappropriately imported content to
clutter and even legally encumber a source control system. Like
pruning any history, it needs to be done cautiously or important
material can be lost..

Re: Problem with svndumpfilter

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Jun 06, 2018 at 03:12:20PM -0400, Alfred von Campe wrote:
> I’m trying to remove two sensitive directories from a repo so we can have a 3rd party work on it.  I first dumped the entire repo, and now I’m trying to remove two directories from one particular branch.  But svndumpfilter keeps failing as follows:
> 
> $ svndumpfilter exclude branches/develop/dir1 branches/develop/dir2  < repo.dump > repo-nodir12.dump
> svndumpfilter: E200003: Invalid copy source path '/branches/develop/dir2'
> 
> I’ve tried this both from a full incremental dump of the repo as well as a non-incremental dump of the repo starting from the revision that branches/develop was created.  It always fails after the exact same revision.
> 
> Is there anything I can do to work around this issue?
> 
> Alfred

Yes, you can update to 1.10 and use svnadmin dump --exclude
instead of using svndumpfilter.
See http://subversion.apache.org/docs/release-notes/1.10.html#dump-include-exclude

An alternative that works with earlier releases is to set up svnsync
replication and configure authz access rules for the sync user which
forbid read access to the paths you want to exclude. svnsync will deal
with missing copy sources by translating copies into additions.