You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Bryon Winger <br...@gmail.com> on 2013/04/02 23:32:05 UTC

Re: Splitting out project from repo

I am going through a similar process myself and have some questions about 
your concerns. I'm not trying to rock the boat, just looking fo clarity on 
a few
points.
 
For perspective, I am working with around 300 individual projects
in a 70+ Gb repository containing over 300k revisions.

> If I understand correctly, you manually retrieve each version where 
> the given path/project has changed in any way to afterwards dump those 
> revisions. Why is this better/faster than using svndumpfilter with 
> specifying an include path, but without the need to post process the 
> dump files? 

 
 
I personally don't see the advantage to waiting around for svnadmin dump 

to process every unrelated revision. For one project, I am only concerned 

with about 200 revisions, spread out over 210k unrelated revisions.

 

# This example took around 8 hours:

svnadmin dump /path/to/master | svndumpfilter --drop-empty-revs \
--re-number-revs include $PROJECT > $PROJECT.dump

# However, when I run this on the same project:

for rev in `svn log -r0:HEAD file:///path/to/master/$PROJECT | egrep \

"^r[0-9]+ |" | cut -d " " -f1`; do

   svnadmin dump --incremental -r ${rev:1} /path/to/master | svndumpfilter \

                                             include $PROJECT >> 
$PROJECT.dump

done

 

… I can have a usable dump file in under 30 seconds. I realize this will 
take 

longer for larger projects, but I think it makes my point. ‘svnadmin dump’ 
is 

still creating a full dump stream for each revision before svndumpfilter 
sees 

that revision to decide to keep it or not.

 

> Are you sure your approach doesn't need other paths 
> from the repo, e.g. other source paths from copy operations for 
> projects or stuff like that? 
>
 

I absolutely agree with this checking for this. You can’t successfully pull 
out 

a single path using svnadmin dump / svndumpfilter if there are copies from 
a 

location outside of whatever you are filtering for.

 

I did notice that using svnrdump pointing to url/project seems to get 

around the outside-copy-sources issue, but I think that’s another 

discussion altogether.

 

> > svnadmin dump $repo --quiet -r $rev --incremental >> $project.$rev.bak 
>
> Adding to revision files with >> should be impossible in your 
> approach. 

 
 
Are you saying that appending to an existing dump file in general is a 

problem or just with all of his node-path processing? I have had no 

trouble appending to existing dump files.

 

Thanks,

Bryon Winger

Re: Splitting out project from repo

Posted by Bryon Winger <br...@gmail.com>.
>
>  for rev in `svn log -r0:HEAD ${url}/${project} | \ 
>
>                     egrep "^r[0-9]+ |" | cut -d " " -f1`; do
>
>    svnrdump dump --incremental -r ${rev:1} ${url}/${project} >> 
> ${project}.dump
>
> done
>
  Basically, I am only dumping (incrementally) the revisions which actually 
>
> affect the path in question.
>
 
I have since discoved that incrementally dumping specific revisions via 
svnrdump is not
as safe as I previously thought. Some paths that were copied from outside 
sources did not
get included because I skipped the revision in which it was copied from.
 
So to correct myself and save others frustration - don't skip revisions 
with svnrdump (as
in my example above) unless you absolutely know that you won't be missing 
anything.
 
Bryon

Re: Splitting out project from repo

Posted by Bryon Winger <br...@gmail.com>.

>  You probably still want the svndumpfilter processing to drop empty 
> revisions before loading it in a new repository.
>
 
 
I believe that the current version of svndumpfilter only operates on 

version 2 dump streams - which svnadmin dump produces. svnrdump 

produces a version 3 dump stream and is not compatible with svnrdump.

 

That being said, I am able to get around dumping empty revisions (from a 

previous dump/load) with svnrdump by running something along these lines:

 

for rev in `svn log -r0:HEAD ${url}/${project} | \ 

                    egrep "^r[0-9]+ |" | cut -d " " -f1`; do

   svnrdump dump --incremental -r ${rev:1} ${url}/${project} >> 
${project}.dump

done

 

Basically, I am only dumping (incrementally) the revisions which actually 

affect the path in question. This obviously is not as fast as doing 
everything 

server-side, but it does appear to work around having files or directories 

copied from paths outside of the particular project path. The 

outside-copy-paths are dumped in full as opposed to just a simple reference 

as to where it was originally copied from.

 

I would appreciate some feedback if I’m missing something or if the above 

statement is inaccurate or unreliable. In my tests, everything appears to 
be 

the same once loaded into a fresh repository, checked out in full and 
diffed 

against the originals.

 

There is a very brief mention in the svn-book of appending to an existing 

dump file, so I expect that to be safe in general. It can be found in the 

“*Repository Backup*<http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html#svn.reposadmin.maint.backup>” 
section by searching for ‘appending’.

 

Thanks,

Bryon Winger

RE: Splitting out project from repo

Posted by Bert Huijben <be...@qqmail.nl>.
                Hi,

 

The ‘svnrdump’ tool that was added in Subversion 1.7 might do exactly what you to do.

 

This tool allows creating a dumpfile from a url (E.g. file:///path/to/repos <file:///\\path\to\repos> ) and should skip unrelated paths for you during the repository processing.

 

You probably still want the svndumpfilter processing to drop empty revisions before loading it in a new repository.

 

                Bert

 

From: Bryon Winger [mailto:bryonwinger@gmail.com] 
Sent: dinsdag 2 april 2013 23:32
To: subversion_users@googlegroups.com
Cc: users@subversion.apache.org; tschoening@am-soft.de
Subject: Re: Splitting out project from repo

 

I am going through a similar process myself and have some questions about 

your concerns. I'm not trying to rock the boat, just looking fo clarity on a few

points.

 

For perspective, I am working with around 300 individual projects

in a 70+ Gb repository containing over 300k revisions.

If I understand correctly, you manually retrieve each version where 
the given path/project has changed in any way to afterwards dump those 
revisions. Why is this better/faster than using svndumpfilter with 
specifying an include path, but without the need to post process the 
dump files? 

 

I personally don't see the advantage to waiting around for svnadmin dump 

to process every unrelated revision. For one project, I am only concerned 

with about 200 revisions, spread out over 210k unrelated revisions.

 

# This example took around 8 hours:

svnadmin dump /path/to/master | svndumpfilter --drop-empty-revs \
--re-number-revs include $PROJECT > $PROJECT.dump

# However, when I run this on the same project:

for rev in `svn log -r0:HEAD file:///path/to/master/$PROJECT <file:///\\path\to\master\$PROJECT>  | egrep \

"^r[0-9]+ |" | cut -d " " -f1`; do

   svnadmin dump --incremental -r ${rev:1} /path/to/master | svndumpfilter \

                                             include $PROJECT >> $PROJECT.dump

done

 

… I can have a usable dump file in under 30 seconds. I realize this will take 

longer for larger projects, but I think it makes my point. ‘svnadmin dump’ is 

still creating a full dump stream for each revision before svndumpfilter sees 

that revision to decide to keep it or not.

 

Are you sure your approach doesn't need other paths 
from the repo, e.g. other source paths from copy operations for 
projects or stuff like that? 

 

I absolutely agree with this checking for this. You can’t successfully pull out 

a single path using svnadmin dump / svndumpfilter if there are copies from a 

location outside of whatever you are filtering for.

 

I did notice that using svnrdump pointing to url/project seems to get 

around the outside-copy-sources issue, but I think that’s another 

discussion altogether.

 

> svnadmin dump $repo --quiet -r $rev --incremental >> $project.$rev.bak 

Adding to revision files with >> should be impossible in your 
approach. 

 

Are you saying that appending to an existing dump file in general is a 

problem or just with all of his node-path processing? I have had no 

trouble appending to existing dump files.

 

Thanks,

Bryon Winger


Re: Splitting out project from repo

Posted by Thorsten Schöning <ts...@am-soft.de>.
Guten Tag Bryon Winger,
am Dienstag, 2. April 2013 um 23:32 schrieben Sie:

> Are you saying that appending to an existing dump file in general is a
>  
> problem or just with all of his node-path processing? I have had no 
>  
> trouble appending to existing dump files.

I don't know if appending to a dump file is supported or not, I just
meant his revision based file naming approach.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning       E-Mail:Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow