You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@subversion.apache.org by "Trent W. Buck" <tr...@gmail.com> on 2013/09/09 03:13:12 UTC

Breaking up a monolothic repository

I have inherited a single monolithic repo for all the company's
projects.  I want to migrate to one repo per project. (One-way,
one-time migration.)

Following the red-bean book[0], I first tried svnadmin, which was
really slow, and eventually crashed because some files were copied
into projects/133_Redacted from a different subdir.

    rm -rf delete-me
    svnadmin create delete-me
    svnadmin dump /srv/svn/Frobozz |
    svndumpfilter --drop-empty-revs include projects/133_Redacted |
    svnadmin load delete-me

    [...]
    svndumpfilter: Invalid copy source path '/EE/ProjectDocs/133_Redacted/REDACTED.pdf'
    svnadmin: Can't write to stream: Broken pipe
    <<< Started new transaction, based on original revision 4182
    svnadmin: File not found: transaction '0-0', path 'projects/133_Redacted' * adding path : projects/133_Redacted ...

Freenode's #svn IRC channel advised me to use svnsync instead.  That
was really slow, eventually succeeded, but left a tonne of empty
commit messages

    rm -rf delete-me-2
    svnadmin create delete-me-2
    ln -s /bin/true delete-me-2/hooks/pre-revprop-change
    svnsync init file://$PWD/delete-me-2 file:///srv/svn/Frobozz/projects/133_Redacted
    svnsync sync file://$PWD/delete-me-2
    rm delete-me-2/hooks/pre-revprop-change

So then I thought to chain the two approaches. This didn't work -- the
empty revs were not removed. I guess svndumpfilter --drop-empty-revs
is only smart enough to drop the revs that have just *become* empty?

    rm -rf delete-me-3
    svnadmin create delete-me-3
    svnadmin dump delete-me-2 |
    svndumpfilter --drop-empty-revs exclude /canthappen |
    svnadmin load delete-me-3

I also thought of converting to git fast-export format and back again,
but AFAICT there is no way to import a fast-export into a svn repo.

I'm stuck.  Since it's no fun to have tens of thousands of empty revs
in each project repo, my current approach is to leave existing
projects in the monolithic repo, and new projects get separate repos.

What else can I do?

[0] http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html

Re: Breaking up a monolothic repository

Posted by "Trent W. Buck" <tr...@gmail.com>.

trentbuck@gmail.com (Trent W. Buck) writes:

> So then I thought to chain the two approaches. This didn't work -- the
> empty revs were not removed. I guess svndumpfilter --drop-empty-revs
> is only smart enough to drop the revs that have just *become* empty?
>
>     rm -rf delete-me-3
>     svnadmin create delete-me-3
>     svnadmin dump delete-me-2 |
>     svndumpfilter --drop-empty-revs exclude /canthappen |
>     svnadmin load delete-me-3

A helpful offlist correspondent noted svn 1.8 has --drop-all-empty-revs,
so I might try building that long enough to try that option.

Re: Breaking up a monolothic repository

Posted by "Trent W. Buck" <tr...@gmail.com>.

Geoff Field <Ge...@aapl.com.au> writes:

>> I get the impression that $company's projects mostly have a finite
>> lifespan (a couple of years),
>
> By "lifespan", what exactly do you mean?  At my company, the
> individual projects might be in production within anywhere from 6
> months to 2 years after start of development, be manufactured for two
> to four years, then go into support mode for up to 7 years (or more).

That's probably a more accurate way of putting it.
But the bottom line is migration through attrition ought to work.

> It's entirely possible that the empty commit messages you reported
> were due to users not actually entering anything in the messages.
> Many of the commit messages I've seen (particularly from non-software
> people, but even from a few of those) are less informative than I'd
> like - a lot are totally empty.

Ah, sorry, I wasn't clear.  Supposing the repo has two subdirs:

    projects/1_Muffins
    projects/2_Cakes

Then when I use svnsync to make a repo that only contains
projects/2_Cakes, I still have a bunch of commits that WERE making
changes to projects/1_Muffins -- so they have commit messages and
authors and times and suchlike metadata -- but they don't actually *do*
anything anymore, because they files they edited aren't in
projects/2_Cakes.

If there were only two projects, it wouldn't be too bad, but suppose 100
projects, with 1000 commits each.  If I use svnsync, I end up with 100
repos, each of which has 99,000 useless commits.

RE: Breaking up a monolothic repository

Posted by Geoff Field <Ge...@aapl.com.au>.

> From: Trent W. Buck
> Sent: Monday, 9 September 2013 12:17 PM
> Nico Kadel-Garcia <nk...@gmail.com> writes:
> 
> > Lock the existing repo: Do clean exports, and imports, to new 
> > repositories with the new layout, with a README.md or other 
> guideline 
> > to where the legacy repository exists. You lose the infinitely 
> > preserved history this way, but for most working software projects, 
> > you don't *need* that. And it's a good opportunity to discard 
> > materials, such as bulky binaries or security sensitive 
> files with plain text passwords.
> 
> Ah, sorry, I forgot to mention that preserving history was a 
> hard requirement handed down from higher up.

You *could* argue that the existing repository preserves the history.
However, I think I know what they mean.

> I get the impression that $company's projects mostly have a 
> finite lifespan (a couple of years),

By "lifespan", what exactly do you mean?  At my company, the individual projects might be in production within anywhere from 6 months to 2 years after start of development, be manufactured for two to four years, then go into support mode for up to 7 years (or more).

> so I think that approach 
> ends up being very similar to my current plan of creating new 
> projects as new repos, and letting the monolithic repo die 
> out via attrition.

That sounds like an easy way to do things.

> I don't actually know exactly what they put in their repos; I 
> think it's about half "huge unpacked source tarball I 
> downloaded from somewhere then tinkered with" and half huge 
> CAD files and .docx contracts.

It's entirely possible that the empty commit messages you reported were due to users not actually entering anything in the messages.  Many of the commit messages I've seen (particularly from non-software people, but even from a few of those) are less informative than I'd like - a lot are totally empty.

Regards,

Geoff

-- 
Apologies for the auto-generated legal boilerplate added by our IT department:




- The contents of this email, and any attachments, are strictly private
and confidential.
- It may contain legally privileged or sensitive information and is intended
solely for the individual or entity to which it is addressed.
- Only the intended recipient may review, reproduce, retransmit, disclose,
disseminate or otherwise use or take action in reliance upon the information
contained in this email and any attachments, with the permission of
Australian Arrow Pty. Ltd.
- If you have received this communication in error, please reply to the sender
immediately and promptly delete the email and attachments, together with
any copies, from all computers.
- It is your responsibility to scan this communication and any attached files
for computer viruses and other defects and we recommend that it be
subjected to your virus checking procedures prior to use.
- Australian Arrow Pty. Ltd. does not accept liability for any loss or damage
of any nature, howsoever caused, which may result
directly or indirectly from this communication or any attached files.

Re: Breaking up a monolothic repository

Posted by "Trent W. Buck" <tr...@gmail.com>.

Nico Kadel-Garcia <nk...@gmail.com> writes:

> Lock the existing repo: Do clean exports, and imports, to new repositories
> with the new layout, with a README.md or other guideline to where the
> legacy repository exists. You lose the infinitely preserved history this
> way, but for most working software projects, you don't *need* that. And
> it's a good opportunity to discard materials, such as bulky binaries or
> security sensitive files with plain text passwords.

Ah, sorry, I forgot to mention that preserving history was a hard
requirement handed down from higher up.

I get the impression that $company's projects mostly have a finite
lifespan (a couple of years), so I think that approach ends up being
very similar to my current plan of creating new projects as new repos,
and letting the monolithic repo die out via attrition.

I don't actually know exactly what they put in their repos; I think it's
about half "huge unpacked source tarball I downloaded from somewhere
then tinkered with" and half huge CAD files and .docx contracts.

Re: Breaking up a monolothic repository

Posted by Nico Kadel-Garcia <nk...@gmail.com>.

Lock the existing repo: Do clean exports, and imports, to new repositories
with the new layout, with a README.md or other guideline to where the
legacy repository exists. You lose the infinitely preserved history this
way, but for most working software projects, you don't *need* that. And
it's a good opportunity to discard materials, such as bulky binaries or
security sensitive files with plain text passwords.


On Sun, Sep 8, 2013 at 9:13 PM, Trent W. Buck <tr...@gmail.com> wrote:

> I have inherited a single monolithic repo for all the company's
> projects.  I want to migrate to one repo per project. (One-way,
> one-time migration.)
>
> Following the red-bean book[0], I first tried svnadmin, which was
> really slow, and eventually crashed because some files were copied
> into projects/133_Redacted from a different subdir.
>
>     rm -rf delete-me
>     svnadmin create delete-me
>     svnadmin dump /srv/svn/Frobozz |
>     svndumpfilter --drop-empty-revs include projects/133_Redacted |
>     svnadmin load delete-me
>
>     [...]
>     svndumpfilter: Invalid copy source path
> '/EE/ProjectDocs/133_Redacted/REDACTED.pdf'
>     svnadmin: Can't write to stream: Broken pipe
>     <<< Started new transaction, based on original revision 4182
>     svnadmin: File not found: transaction '0-0', path
> 'projects/133_Redacted' * adding path : projects/133_Redacted ...
>
> Freenode's #svn IRC channel advised me to use svnsync instead.  That
> was really slow, eventually succeeded, but left a tonne of empty
> commit messages
>
>     rm -rf delete-me-2
>     svnadmin create delete-me-2
>     ln -s /bin/true delete-me-2/hooks/pre-revprop-change
>     svnsync init file://$PWD/delete-me-2
> file:///srv/svn/Frobozz/projects/133_Redacted
>     svnsync sync file://$PWD/delete-me-2
>     rm delete-me-2/hooks/pre-revprop-change
>
> So then I thought to chain the two approaches. This didn't work -- the
> empty revs were not removed. I guess svndumpfilter --drop-empty-revs
> is only smart enough to drop the revs that have just *become* empty?
>
>     rm -rf delete-me-3
>     svnadmin create delete-me-3
>     svnadmin dump delete-me-2 |
>     svndumpfilter --drop-empty-revs exclude /canthappen |
>     svnadmin load delete-me-3
>
> I also thought of converting to git fast-export format and back again,
> but AFAICT there is no way to import a fast-export into a svn repo.
>
> I'm stuck.  Since it's no fun to have tens of thousands of empty revs
> in each project repo, my current approach is to leave existing
> projects in the monolithic repo, and new projects get separate repos.
>
> What else can I do?
>
> [0] http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html
>
>

Re: Breaking up a monolothic repository

Posted by Les Mikesell <le...@gmail.com>.

On Wed, Sep 11, 2013 at 10:49 PM, Nico Kadel-Garcia <nk...@gmail.com> wrote:
> Les, disk space isn't the issue for the empty revs. It's any operations that
> try to scan or assemble information from the revisions. 5000 empty "objects"
> is still a logistical burden, especially if assembling any kind of change
> history for the new repository.

I don't see how that imposes a bigger computational burden than the
same number of unrelated revisions did in the combined repo. - which
typically is not a problem.  We are at rev 186767 on a large
multi-project repo which, although I wish it had been created as
separate repos for easier future maintenance, does not have serious
performance issues.

> And since the new repositories are
> effectively a rebase of a subset of the code, you don't normally *gain*
> anything from having empty revisions for code that is in the other new
> repositories. You can't meaninglfully merge content between the new smaller
> repositories and the old repo, barring some seriously weird cases, so it's
> safer to treat them as completely distinct and not bother to preserve all
> the empty revisions.
>
> The "revision numbers are stored in support tickets" is the only reason I
> can think of to keep them.

Or pegged externals if they stay in the same relative location.  Or
any email, documentation or recorded discussion referring to the
changes in a revision.   My point is that any change that requires new
training or human intervention to fix something is never going to win
back that time.   Someone who completely understands the current
process and user base might be able to optimize and improve it with
drastic changes, but that seems unlikely if they are asking for advice
on a mail list.

-- 
   Les Mikesell
    lesmikesell@gmail.com

Re: Breaking up a monolothic repository

Posted by Nico Kadel-Garcia <nk...@gmail.com>.

Les, disk space isn't the issue for the empty revs. It's any operations
that try to scan or assemble information from the revisions. 5000 empty
"objects" is still a logistical burden, especially if assembling any kind
of change history for the new repository. And since the new repositories
are effectively a rebase of a subset of the code, you don't normally *gain*
anything from having empty revisions for code that is in the other new
repositories. You can't meaninglfully merge content between the new smaller
repositories and the old repo, barring some seriously weird cases, so it's
safer to treat them as completely distinct and not bother to preserve all
the empty revisions.

The "revision numbers are stored in support tickets" is the only reason I
can think of to keep them.

On Tue, Sep 10, 2013 at 11:35 AM, Les Mikesell <le...@gmail.com>wrote:

> On Tue, Sep 10, 2013 at 6:22 AM, Nico Kadel-Garcia <nk...@gmail.com>
> wrote:
> >>
> > Even if the history is considered sacrosanct (and this is often a
> > theological policy, not an engineering one!), an opportunity to reduce
> the
> > size of each reaporitory by discarding deadwood at switchover time
> should be
> > taken seriously.
>
> Those empty revs take what, a couple of dollars worth of disk space
> (OK, x3 or 4 for backups...), vs. how much human time will it take to
> make everyone involved understand that you use one procedure for
> revisions before a certain date, and a different one after, and to get
> diffs between them you have to either check out both copies and use
> local tools or map the rev number from your old reference to the new
> numbering scheme?   And then there are likely to be pegged externals
> to pull in components that you'll have to fix even if they stay within
> the same project repo and use relative notation.   I'd call not
> unnecessarily changing the history you use a version control system to
> preserve to be 'philosophically correct'  as opposed to a theological
> requirement.  If your engineering choices were always right the first
> time, you probably wouldn't have all these revisions in the first
> place.
>
> --
>    Les Mikesell
>       lesmikesell@gmail.com
>

Re: Breaking up a monolothic repository

Posted by Les Mikesell <le...@gmail.com>.

On Tue, Sep 10, 2013 at 6:22 AM, Nico Kadel-Garcia <nk...@gmail.com> wrote:
>>
> Even if the history is considered sacrosanct (and this is often a
> theological policy, not an engineering one!), an opportunity to reduce the
> size of each reaporitory by discarding deadwood at switchover time should be
> taken seriously.

Those empty revs take what, a couple of dollars worth of disk space
(OK, x3 or 4 for backups...), vs. how much human time will it take to
make everyone involved understand that you use one procedure for
revisions before a certain date, and a different one after, and to get
diffs between them you have to either check out both copies and use
local tools or map the rev number from your old reference to the new
numbering scheme?   And then there are likely to be pegged externals
to pull in components that you'll have to fix even if they stay within
the same project repo and use relative notation.   I'd call not
unnecessarily changing the history you use a version control system to
preserve to be 'philosophically correct'  as opposed to a theological
requirement.  If your engineering choices were always right the first
time, you probably wouldn't have all these revisions in the first
place.

-- 
   Les Mikesell
      lesmikesell@gmail.com

RE: Breaking up a monolothic repository

Posted by Bob Archer <Bo...@amsi.com>.

> Am 10.09.2013 19:45, schrieb Thomas Harold:
> 
> > When we moved from a monolithic repository to per-client repositories
> > a few years ago, we went ahead and:
> >
> > - Rebased the paths up one or two levels (old system was something
> > like "monolithicrepo/[a-z]/[client directories]/[job directory]") so
> > that the urls were now "clientrepo/[job directory]".  That was a
> > tricky thing to do and we had to 'sed' the output of the dump filter
> > before importing it back.
> >
> > It broke a few things, such as svn:externals which were not
> > relative-pathed, but was worth it in the long run so that our URLs got
> > shorter.
> >
> > - Made sure that the new repos all had unique UUIDs.
> >
> > - Renumbered all of the resulting revisions as we loaded things back in.
> >   But we didn't have to deal with any bug tracking systems that
> > referred to a specific revision.  And having lower revision numbers
> > was preferred, along with dropping revisions that referred to other projects.
> 
> I'm now facing the same problem. My users want the rebasing, but during the
> dump/load instead of after the fact (apparently, it causes issues with their
> environment when they need to go back to an earlier revision to reproduce
> something). They also want to keep the empty revisions (for references from
> the issue tracker).

Wouldn't it be much simpler to keep the current repository as a read only archives and move the HEAD of each project into its own repo?


> I haven't tried it with svnadmin dump followed by svndumpfilter (I don't think it
> has that capability).
> 
> I've tried svnrdump (from svn 1.7), it resulted in either a new repository with
> the full path included (rdump/load all revs) or an interesting failure mode with
> a missing node during a copy operation when rdump -r
> <revision_after_path>:HEAD was used
> 
> I've also tried using svnsync, but that also results in the full path included, no
> rebasing.
> 
> How did you do it? Also, am I missing something that has been included in a
> current svn version?
> 
> Cheers,
> 
> Ulli

Re: Breaking up a monolothic repository

Posted by Thomas Harold <th...@nybeta.com>.

On 10/2/2013 10:36 AM, Ullrich Jans wrote:
>
> I'm now facing the same problem. My users want the rebasing, but during
> the dump/load instead of after the fact (apparently, it causes issues
> with their environment when they need to go back to an earlier revision
> to reproduce something). They also want to keep the empty revisions (for
> references from the issue tracker).
>
> I haven't tried it with svnadmin dump followed by svndumpfilter (I don't
> think it has that capability).

The command we ended up using back in May 2011 when we did this looked 
like the following.  It's been two years, but I'm pretty sure these two 
scripts is all we ended up using.

- We had a master dump of the entire brc-jobs repository.
- Target repository name was "brc-jobs-zp" (CLCODE)
- It takes the dump and splits it into a smaller chunk (CLPATH).
- Had to edit the script for each new client/path that we wanted to 
split out.

It does *not* attempt to rebase the individual projects up to the root 
directory.  It *is* possible by using 'sed' to do this in the resulting 
dump file, but it is trick.

----------------------------------------
#!/bin/bash

SOURCE=/mnt/scratch/svn-dump-brc-jobs.may2011.dump.gz

DESTDIR=/var/svn/
DESTPFX=svn-raw-brc-jobs-
DESTSFX=10xx.dump.gz

CLCODE=zp
CLPATH=Z/ZP_SingleJobs

SDFOPTS='--drop-empty-revs  --renumber-revs'

date

echo ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX}

svnadmin dump --quiet /var/svn/brc-jobs | \
     svndumpfilter include --quiet $SDFOPTS $CLPATH | \
     gzip > ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX}

date
----------------------------------------

The mirror to this was the script that created the new SVN repository 
and loads in the individual dump.

Note the commented out 'sed' lines where we attempted to rebase 
individual project folders back up to the root of the repository.  They 
didn't work, so we ended up just doing a move operation in the 
TortoiseSVN repository browser.

- It changes the UUID of the newly created repository to be something 
unique instead of using the old repo's UUID.
- Had to be edited anew for each new client/path.

----------------------------------------
#!/bin/bash

SRCDIR=/var/svn/
SRCPFX=svn-raw-brc-jobs-
SRCSFX=10xx.dump.gz

DESTDIR=/var/svn/
DESTPFX=svn-newbase-brc-jobs-
DESTSFX=10xx.dump.gz

SDFOPTS='--quiet --drop-empty-revs  --renumber-revs'

CLPARENT=Z
CLCODE=zp

date

#gunzip -c ${SRCDIR}${SRCPFX}${CLCODE}${SRCSFX} | \
#sed "s/Node-path: $CLPATH\//Node-path: /" | \
#sed "s/Node-copyfrompath: $CLPATH\//Node-copyfrompath: /" | \
#gzip > ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX}

svn mkdir -m "Import from brc-jobs" 
file:///var/svn/brc-jobs-${CLCODE}/${CLPARENT}

gunzip -c ${SRCDIR}${SRCPFX}${CLCODE}${SRCSFX} | \
   svnadmin load --quiet /var/svn/brc-jobs-${CLCODE}

svnlook uuid /var/svn/brc-jobs-${CLCODE}
svnadmin setuuid /var/svn/brc-jobs-${CLCODE}
svnlook uuid /var/svn/brc-jobs-${CLCODE}
svnadmin pack /var/svn/brc-jobs-${CLCODE}

chmod -R 775 /var/svn/brc-jobs-${CLCODE}
chmod -R g+s /var/svn/brc-jobs-${CLCODE}/db
chgrp -R svn-brc-jobs /var/svn/brc-jobs-${CLCODE}

date
----------------------------------------

I do wish I could have figured out the 'sed' commands to move a project 
from /Z/ZP_SingleJobs/JOBNR to be just /JOBNR in the repository, but 
there wasn't time.

For rebasing, that's probably your missing piece... which I don't have.

Re: Breaking up a monolothic repository

Posted by Ullrich Jans <ul...@elektrobit.com>.

Am 10.09.2013 19:45, schrieb Thomas Harold:

> When we moved from a monolithic repository to per-client repositories a
> few years ago, we went ahead and:
>
> - Rebased the paths up one or two levels (old system was something like
> "monolithicrepo/[a-z]/[client directories]/[job directory]") so that the
> urls were now "clientrepo/[job directory]".  That was a tricky thing to
> do and we had to 'sed' the output of the dump filter before importing it
> back.
>
> It broke a few things, such as svn:externals which were not
> relative-pathed, but was worth it in the long run so that our URLs got
> shorter.
>
> - Made sure that the new repos all had unique UUIDs.
>
> - Renumbered all of the resulting revisions as we loaded things back in.
>   But we didn't have to deal with any bug tracking systems that referred
> to a specific revision.  And having lower revision numbers was
> preferred, along with dropping revisions that referred to other projects.

I'm now facing the same problem. My users want the rebasing, but during 
the dump/load instead of after the fact (apparently, it causes issues 
with their environment when they need to go back to an earlier revision 
to reproduce something). They also want to keep the empty revisions (for 
references from the issue tracker).

I haven't tried it with svnadmin dump followed by svndumpfilter (I don't 
think it has that capability).

I've tried svnrdump (from svn 1.7), it resulted in either a new 
repository with the full path included (rdump/load all revs) or an 
interesting failure mode with a missing node during a copy operation 
when rdump -r <revision_after_path>:HEAD was used

I've also tried using svnsync, but that also results in the full path 
included, no rebasing.

How did you do it? Also, am I missing something that has been included 
in a current svn version?

Cheers,

Ulli

-- 
Ullrich Jans, Specialist, IT-A
Phone: +49 9131 7701-6627, mailto:ullrich.jans@elektrobit.com
Fax: +49 9131 7701-6333, www.elektrobit.com

Elektrobit Automotive GmbH, Am Wolfsmantel 46, 91058 Erlangen, Germany
Managing Directors: Alexander Kocher, Gregor Zink
Register Court Fürth HRB 4886


----------------------------------------------------------------
Please note: This e-mail may contain confidential information
intended solely for the addressee. If you have received this
e-mail in error, please do not disclose it to anyone, notify
the sender promptly, and delete the message from your system.
Thank you.

Re: Breaking up a monolothic repository

Posted by Thomas Harold <th...@nybeta.com>.

On 9/10/2013 7:22 AM, Nico Kadel-Garcia wrote:
> But keeping thousands of empty commits in a project they're not relevant
> to is confusing and wasteful. The  repository and repository URL's for
> the old project should be preserved, if possible, locked down and
> read-only, precisely for this kind of change history. But since the
> repository is being completely refactored *anyway*, it's a great
> opportunity to discard debris.

When we moved from a monolithic repository to per-client repositories a 
few years ago, we went ahead and:

- Rebased the paths up one or two levels (old system was something like 
"monolithicrepo/[a-z]/[client directories]/[job directory]") so that the 
urls were now "clientrepo/[job directory]".  That was a tricky thing to 
do and we had to 'sed' the output of the dump filter before importing it 
back.

It broke a few things, such as svn:externals which were not 
relative-pathed, but was worth it in the long run so that our URLs got 
shorter.

- Made sure that the new repos all had unique UUIDs.

- Renumbered all of the resulting revisions as we loaded things back in. 
  But we didn't have to deal with any bug tracking systems that referred 
to a specific revision.  And having lower revision numbers was 
preferred, along with dropping revisions that referred to other projects.

> Even if the history is considered sacrosanct (and this is often a
> theological policy, not an engineering one!), an opportunity to reduce
> the size of each repository by discarding deadwood at switchover time
> should be taken seriously.

Less of an issue now that svn 1.8 has revprop packing (plus the rev 
packing from 1.6).  That deadwood takes up a lot less space in terms of 
the number of files in the file system.

And the fact that svnadmin hotcopy is now incremental in 1.8 also makes 
it less of an issue.  Having a few thousand (tens of thousands) 
revisions in a repository is no longer a big bottleneck during the 
hotcopy process like it was before.

Our backup system is also a lot happier with fewer files to backup.

Re: Breaking up a monolothic repository

Posted by Nico Kadel-Garcia <nk...@gmail.com>.

> Have you checked if the users have/need anything (emails, ticket
system, etc.) that refer to specific revisions or the history of
changes made there?   It seems kind of drastic to throw that away
because you think the numbers aren't pretty enough.


But keeping thousands of empty commits in a project they're not relevant to
is confusing and wasteful. The  repository and repository URL's for the old
project should be preserved, if possible, locked down and read-only,
precisely for this kind of change history. But since the repository is
being completely refactored *anyway*, it's a great opportunity to discard
debris.

Even if the history is considered sacrosanct (and this is often a
theological policy, not an engineering one!), an opportunity to reduce the
size of each reaporitory by discarding deadwood at switchover time should
be taken seriously.

Re: Breaking up a monolothic repository

Posted by Les Mikesell <le...@gmail.com>.

On Tue, Sep 10, 2013 at 4:36 PM, Bob Archer <Bo...@amsi.com> wrote:
>
>> >>Also part of the reason to split up the  repos is to make access
>> >>control easier, and it looks bad if Alice (who  should have access to
>> >>project 1 but not project 2) can see Bob's old  commit metadata to
>> >>project 2, even if she can't see the commit bodies  after the split.
>> >
>> > How does this work now in the combined repository?
>>
>> Right now, they don't have it with the combined repo.  Anyone in the svn group
>> can read everything.  (This is one of the reasons they want to break up the
>> single repo into per-project repos.)
>
> You should knock the reason off the list. You can set up path based authorization fairly easily. (especially compared to braking it up into multiple repos.)
>

Unless you already have a central authentication source you'll have a
certain tradeoff in complexity between maintaining password control
for multiple repos vs. path-based control in a single one and if there
are external references where different groups use each others'
libraries it can be a little messy either way.

-- 
   Les Mikesell
    lesmikesell@gmail.com

RE: Breaking up a monolothic repository

Posted by Bob Archer <Bo...@amsi.com>.

> -----Original Message-----
> From: twb@elba.apache.org [mailto:twb@elba.apache.org] On Behalf Of Trent
> W. Buck
> Sent: Monday, September 09, 2013 11:38 PM
> To: users@subversion.apache.org
> Subject: Re: Breaking up a monolothic repository
> 
> Les Mikesell <le...@gmail.com> writes:
> 
> > On Mon, Sep 9, 2013 at 7:23 PM, Trent W. Buck <tr...@gmail.com>
> wrote:
> >> Ryan Schmidt <su...@ryandesign.com> writes:
> >>
> >>> As someone used to Subversion's usually sequential revision numbers,
> >>> that bugs me aesthetically, but it works fine.
> >>
> >> I think that's the crux of it.
> >
> > Have you checked if the users have/need anything (emails, ticket
> > system, etc.) that refer to specific revisions or the history of
> > changes made there?   It seems kind of drastic to throw that away
> > because you think the numbers aren't pretty enough.
> 
> That is an extremely valid point.  I'll check.
> 
> >>Also part of the reason to split up the  repos is to make access
> >>control easier, and it looks bad if Alice (who  should have access to
> >>project 1 but not project 2) can see Bob's old  commit metadata to
> >>project 2, even if she can't see the commit bodies  after the split.
> >
> > How does this work now in the combined repository?
> 
> Right now, they don't have it with the combined repo.  Anyone in the svn group
> can read everything.  (This is one of the reasons they want to break up the
> single repo into per-project repos.)

You should knock the reason off the list. You can set up path based authorization fairly easily. (especially compared to braking it up into multiple repos.)

BOb

Re: Breaking up a monolothic repository

Posted by "Trent W. Buck" <tr...@gmail.com>.

Les Mikesell <le...@gmail.com> writes:

> On Mon, Sep 9, 2013 at 7:23 PM, Trent W. Buck <tr...@gmail.com> wrote:
>> Ryan Schmidt <su...@ryandesign.com> writes:
>>
>>> As someone used to Subversion's usually sequential revision numbers,
>>> that bugs me aesthetically, but it works fine.
>>
>> I think that's the crux of it.
>
> Have you checked if the users have/need anything (emails, ticket
> system, etc.) that refer to specific revisions or the history of
> changes made there?   It seems kind of drastic to throw that away
> because you think the numbers aren't pretty enough.

That is an extremely valid point.  I'll check.

>>Also part of the reason to split up the
>> repos is to make access control easier, and it looks bad if Alice (who
>> should have access to project 1 but not project 2) can see Bob's old
>> commit metadata to project 2, even if she can't see the commit bodies
>> after the split.
>
> How does this work now in the combined repository?

Right now, they don't have it with the combined repo.  Anyone in the svn
group can read everything.  (This is one of the reasons they want to
break up the single repo into per-project repos.)

Re: Breaking up a monolothic repository

Posted by Les Mikesell <le...@gmail.com>.

On Mon, Sep 9, 2013 at 7:23 PM, Trent W. Buck <tr...@gmail.com> wrote:
> Ryan Schmidt <su...@ryandesign.com> writes:
>
>> As someone used to Subversion's usually sequential revision numbers,
>> that bugs me aesthetically, but it works fine.
>
> I think that's the crux of it.

Have you checked if the users have/need anything (emails, ticket
system, etc.) that refer to specific revisions or the history of
changes made there?   It seems kind of drastic to throw that away
because you think the numbers aren't pretty enough.

>Also part of the reason to split up the
> repos is to make access control easier, and it looks bad if Alice (who
> should have access to project 1 but not project 2) can see Bob's old
> commit metadata to project 2, even if she can't see the commit bodies
> after the split.

How does this work now in the combined repository?

-- 
   Les Mikesell
      lesmikesell@gmail.com

Re: Breaking up a monolothic repository

Posted by "Trent W. Buck" <tr...@gmail.com>.

Ryan Schmidt <su...@ryandesign.com> writes:

> As someone used to Subversion's usually sequential revision numbers,
> that bugs me aesthetically, but it works fine.

I think that's the crux of it.  Also part of the reason to split up the
repos is to make access control easier, and it looks bad if Alice (who
should have access to project 1 but not project 2) can see Bob's old
commit metadata to project 2, even if she can't see the commit bodies
after the split.

Re: Breaking up a monolothic repository

Posted by Ryan Schmidt <su...@ryandesign.com>.

On Sep 9, 2013, at 07:31, Les Mikesell wrote:

> On Sun, Sep 8, 2013 at 8:13 PM, Trent W. Buck wrote:
> 
>> I'm stuck.  Since it's no fun to have tens of thousands of empty revs
>> in each project repo, my current approach is to leave existing
>> projects in the monolithic repo, and new projects get separate repos.
> 
> Why do you think an empty rev will bother anyone any more in a
> per-project rev that having the rev number jump from a commit to an
> unrelated project does in the combined repo?    It shouldn't be a
> problem in either case.  Rev numbers for any particular use don't need
> to be sequential, you just need to know what they are.

This is true. Heck, if you use a dvcs like git or hg you'll get a completely random revision number (shaped like a sha1 hash) every time. As someone used to Subversion's usually sequential revision numbers, that bugs me aesthetically, but it works fine.

There are also some reasons why keeping the revision number from the old monolithic repository in your new repositories (with empty padding revisions in between) is a really good idea. Have you ever referenced revision numbers in your issue tracker ("fixed in r111"; "r222 broke xyz") or in emails ("can you explain what you did in r333"; "r444 is a great example of abc") or in commit messages ("reverted r555"; "added file forgotten in r666")? If so, you don't want to renumber revs, because that would invalidate all those references.

Re: Breaking up a monolothic repository

Posted by Les Mikesell <le...@gmail.com>.

On Mon, Sep 9, 2013 at 8:03 AM, Grierson, David
<Da...@bskyb.com> wrote:
> I can see Trent's view point that people are weird and get freaked out by the unexpected (where they might expect the revision numbers to be relatively low).
>

I could see that for someone who had never used subversion before and
did not understand the concept of global revision numbers, but not for
anyone who has used a multi-project repository.

> I guess what we should be providing him are points like you do make to help him sell why this isn't an issue to the end users.
>
> Like Les says, if someone performs a large batch of commits to a particular branch then the trunk revision numbers are going to leap forward (unexpectedly). So what to sell those folks concerned about it is that they're experiencing this already.

Revision numbers aren't something you guess at or expect anything
from.  They are only useful in terms of the repository history, and it
doesn't matter if your project runs sequentially or not.   If you want
names/numbers that make human sense, you'll be copying to tags for
easier reference anyway.

-- 
    Les Mikesell
      lesmikesell@gmail.com

RE: Breaking up a monolothic repository

Posted by "Grierson, David" <Da...@bskyb.com>.

I can see Trent's view point that people are weird and get freaked out by the unexpected (where they might expect the revision numbers to be relatively low).

I guess what we should be providing him are points like you do make to help him sell why this isn't an issue to the end users.

Like Les says, if someone performs a large batch of commits to a particular branch then the trunk revision numbers are going to leap forward (unexpectedly). So what to sell those folks concerned about it is that they're experiencing this already.
--
David Grierson - SDLC Tools Specialist 
Sky Broadcasting - Customer Business Systems - SDLC Tools
Tel: +44 1506 325100 / Email: David.Grierson@bskyb.com / Chatter: CBS SDLC Tools
Watermark Building, Alba Campus, Livingston, EH54 7HH

> -----Original Message-----
> From: Les Mikesell [mailto:lesmikesell@gmail.com]
> Sent: 09 September 2013 13:32
> To: Trent W. Buck
> Cc: Subversion
> Subject: Re: Breaking up a monolothic repository
> 
> On Sun, Sep 8, 2013 at 8:13 PM, Trent W. Buck <tr...@gmail.com> wrote:
> >
> > I'm stuck.  Since it's no fun to have tens of thousands of empty revs
> > in each project repo, my current approach is to leave existing
> > projects in the monolithic repo, and new projects get separate repos.
> >
> 
> Why do you think an empty rev will bother anyone any more in a
> per-project rev that having the rev number jump from a commit to an
> unrelated project does in the combined repo?    It shouldn't be a
> problem in either case.  Rev numbers for any particular use don't need
> to be sequential, you just need to know what they are.
> 
> --
>    Les Mikesell
>      lesmikesell@gmail.com

Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trademarks of British Sky Broadcasting Group plc and Sky International AG and are used under licence. British Sky Broadcasting Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of British Sky Broadcasting Group plc (Registration No. 2247735). All of the companies mentioned in this paragraph are incorporated in England and Wales and share the same registered office at Grant Way, Isleworth, Middlesex TW7 5QD.

Re: Breaking up a monolothic repository

Posted by Les Mikesell <le...@gmail.com>.

On Sun, Sep 8, 2013 at 8:13 PM, Trent W. Buck <tr...@gmail.com> wrote:
>
> I'm stuck.  Since it's no fun to have tens of thousands of empty revs
> in each project repo, my current approach is to leave existing
> projects in the monolithic repo, and new projects get separate repos.
>

Why do you think an empty rev will bother anyone any more in a
per-project rev that having the rev number jump from a commit to an
unrelated project does in the combined repo?    It shouldn't be a
problem in either case.  Rev numbers for any particular use don't need
to be sequential, you just need to know what they are.

-- 
   Les Mikesell
     lesmikesell@gmail.com

Re: Breaking up a monolothic repository

Posted by Thorsten Schöning <ts...@am-soft.de>.

Guten Tag Trent W. Buck,
am Dienstag, 10. September 2013 um 02:49 schrieben Sie:

> ...hm, still 1.6.  Is it worth me backporting a newer svn?

I would give it a try, get yourself a current build of 1.8, dump your
old repo and load it into a newly created from your 1.8 version and
see how much space is saved. Your version information about the repo
looks current enough to already use representation sharing, but
depending on how the upgrades were made, svnadmin upgrade vs. full
dump/load cycle, there maybe old duplicate data in the repo created
before svnadmin upgrade. Besides that, 1.8 made improvements to reduce
disk space, too.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning       E-Mail:Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: Breaking up a monolothic repository

Posted by Thomas Harold <th...@nybeta.com>.

On 9/9/2013 8:49 PM, Trent W. Buck wrote:
>
> I'm partway through provisioning the replacement Debian 7 server, which
> will have
>
>      subversion 1.6.17dfsg-4+deb7u3
>      apache2    2.2.22-13
>
> ...hm, still 1.6.  Is it worth me backporting a newer svn?
>

Yes, it's worth installing 1.8.3.

http://www.wandisco.com/subversion/download#debian7

Re: Breaking up a monolothic repository

Posted by "Trent W. Buck" <tr...@gmail.com>.

Thorsten Schöning <ts...@am-soft.de> writes:

> Tell us about the size of your repo
> it's format version and primary data types versioned

(Sorry for not giving this info earlier, and shifting the goal posts --
I personally went rcs->arch->darcs->git and never really used svn, so
I'm feeling pretty noob attacking this problem.)

du reports it is 18GiB.  The current revno is 16115.

    $ grep . /home/svn/PI/{format,db/fs-type,db/format}
    /home/svn/PI/format:5
    /home/svn/PI/db/fs-type:fsfs
    /home/svn/PI/db/format:4
    /home/svn/PI/db/format:layout sharded 1000

As to what kind of files are in there -- I'm not actually sure.
Just doing a dumb look at HEAD's list of files,

    $ svn ls -R file:///home/svn/PI | wc -l
    269281

And looking at the most common extensions:

$ svn ls -R file:///home/svn/PI | sed -n 's/.*\.//p' |
  sort | uniq -c | sort -nr | head -20
  36581 h                          2438 txt
  21732 patch                      2375 sh
  17621 html                       2362 i
  15023 c                          2121 bmp
   8143 py                         1957 mk
   3919 cpp                        1932 po
   3559 png                        1916 class
   3074 gif                        1813 lua
   2950 xml                        1742 cs
   2585 properties                 1613 hpp

Obviously that's not weighted by size, and completely ignores anything
that's not in HEAD anymore.

                               *   *   *

It's currently hosted on an Ubuntu 10.04 server, so my server svn is
quite old:

    subversion 1.6.6dfsg-2ubuntu1.3
    apache2    2.2.14-5ubuntu8.12

I believe some of the users have svn 1.7 on their desktops, but not all.

I'm partway through provisioning the replacement Debian 7 server, which
will have

    subversion 1.6.17dfsg-4+deb7u3
    apache2    2.2.22-13

...hm, still 1.6.  Is it worth me backporting a newer svn?

Re: Breaking up a monolothic repository

Posted by Thorsten Schöning <ts...@am-soft.de>.

Guten Tag Trent W. Buck,
am Montag, 9. September 2013 um 03:13 schrieben Sie:

> What else can I do?

Tell us about the size of your repo, it's format version and primary
data types versioned, as you always can simply clone the entire repo
into one for each project needed and delete and move unneeded contents
per new project repo with a Subversion client. The current format of
the repo and it's primary data types are interesting because if it's
pretty old, current repo versions may provide a significantly reduced
disk space per repo, making the overhead of duplicating the original
one acceptable.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning       E-Mail:Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow