You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Faheem Mitha <fa...@email.unc.edu> on 2003/06/09 19:06:30 UTC

backing up a repository over a network

Dear People,

I'm currently storing both my repository and my working copy on my home
computer. My connection to the net is currently dial-up, so I use rsync
over ssh to backup my data to a remote computer (remote) as follows.

rsync -avz --partial --delete -e ssh /home/faheem/svn/ remote:svn/

This is designed to produce an exact copy of the /svn directory. Now,
rsync doesn't know anything about subversion, so it is possible that it
could produce a copy that is a broken repository at the other end. In fact
this does happen, at least some of the time. I'm not sure why, because
rsync is usually very good at making exact copies.

In any case, I noticed there is a subversion script (hot-backup.py) which
does a repository backup that is intelligent, so less likely to give a
broken result. However, it seems designed for local use, and not for
networking.

So, I was wondering what was recommended for backing up over the network.

Furthermore, when doing these backups, I noticed an oddity. Suppose I

a) Run the backup script so local and remote repositories are the same.
b) Do a checkout from the remote repository.
c) Run the backup script again.

I would expect that no changes would be detected in c). In fact, the
remote repository appears to have changed, since the two are now
different, and the local repository certainly hasn't changed in the
meantime. This seems a little strange to me, since I can't see any reason
why doing a checkout should change a repository. Does it keep a note of
time of last access or log of activity or something like that?

                                                         Faheem.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by Ben Collins <bc...@debian.org>.
> The DB format itself is labelled with the endianness it was created 
> with, and so can be moved.  Any data you put into the database is not 
> automatically converted.

No, I mean that bdb itself has never been able to use databases that I
move. It cannot even open the environment. Maybe it's different now.
It's been along time since I tried it.

-- 
Debian     - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
Deqo       - http://www.deqo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by William Uther <wi...@cse.unsw.edu.au>.
On Tuesday, June 10, 2003, at 11:55  AM, Ben Collins wrote:

> On Mon, Jun 09, 2003 at 10:43:51PM -0400, Greg Hudson wrote:
>> On Mon, 2003-06-09 at 15:51, cmpilato@collab.net wrote:
>>> I don't really understand the "svn unawareness" issue.  Files are
>>> files, right?  Binary data.  Ones and zeros.  Now, granted, if you
>>> move a Berkeley DB environment between some platforms, things'll be
>>> all busted up.  That's why Berkeley provides its own db_dump and
>>> db_load utilities, to take care of endianness issues between
>>> platforms.
>>
>> Not according to the docs:
>>
>>   http://www.sleepycat.com/docs/ref/am_conf/byteorder.html
>
> From personal experience, I think this is bogus info. I've never been
> able to transfer bdb databases between endian-different machines.

 From the above quoted link (in bold at the bottom):
> It is important to note that the Berkeley DB access methods do no data 
> conversion for application specified data.  Key/data pairs written on 
> a little-endian format architecture will be returned to the 
> application exactly as they were written when retrieved on a 
> big-endian format architecture.

The DB format itself is labelled with the endianness it was created 
with, and so can be moved.  Any data you put into the database is not 
automatically converted.

Will              :-}

--
Dr William Uther                            National ICT Australia
Phone: +61 2 9385 6926             School of Computer Science and 
Engineering
Email: willu@cse.unsw.edu.au             University of New South Wales
Jabber: willu@jabber.cse.unsw.edu.au          Sydney, Australia


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by Ben Collins <bc...@debian.org>.
On Mon, Jun 09, 2003 at 10:43:51PM -0400, Greg Hudson wrote:
> On Mon, 2003-06-09 at 15:51, cmpilato@collab.net wrote:
> > I don't really understand the "svn unawareness" issue.  Files are
> > files, right?  Binary data.  Ones and zeros.  Now, granted, if you
> > move a Berkeley DB environment between some platforms, things'll be
> > all busted up.  That's why Berkeley provides its own db_dump and
> > db_load utilities, to take care of endianness issues between
> > platforms.
> 
> Not according to the docs:
> 
>   http://www.sleepycat.com/docs/ref/am_conf/byteorder.html

Re: backing up a repository over a network

Posted by Greg Hudson <gh...@MIT.EDU>.
On Mon, 2003-06-09 at 15:51, cmpilato@collab.net wrote:
> I don't really understand the "svn unawareness" issue.  Files are
> files, right?  Binary data.  Ones and zeros.  Now, granted, if you
> move a Berkeley DB environment between some platforms, things'll be
> all busted up.  That's why Berkeley provides its own db_dump and
> db_load utilities, to take care of endianness issues between
> platforms.

Not according to the docs:

  http://www.sleepycat.com/docs/ref/am_conf/byteorder.html


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by "Jamin W. Collins" <jc...@asgardsrealm.net>.
On Mon, Jun 09, 2003 at 04:50:17PM -0400, Paul L Lussier wrote:
> In a message dated: Mon, 09 Jun 2003 14:35:18 MDT
> "Jamin W. Collins" said:
> 
> > This may be a little off topic for this thread, but how would one go
> > about maintaining several writable copies of the same SVN
> > repository, keeping them all in sync with each other?  Is it even
> > possible?
> 
> Sadly, this is not (yet) a feature of svn.  You might be able to hack 
> something together with a combination of svnadmin dump/restore and 
> rsync, but you're going to be living dangerously.  Especially if you 
> intend to have them all 'active' simultaneously.

Yea, that's what I was wondering about.  The logistics of it just keep
making my brain hurt, so I figured I'd ask here and see if I was just
over thinking the situation or not.  Keeping a relatively hot standby
doesn't seem to be too difficult, it's just the simultaneously active
bit that seems tremendously difficult.

-- 
Jamin W. Collins

Remember, root always has a loaded gun.  Don't run around with it unless
you absolutely need it. -- Vineet Kumar

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by Paul L Lussier <pl...@lanminds.com>.
In a message dated: Mon, 09 Jun 2003 14:35:18 MDT
"Jamin W. Collins" said:

>This may be a little off topic for this thread, but how would one go
>about maintaining several writable copies of the same SVN repository,
>keeping them all in sync with each other?  Is it even possible?

Sadly, this is not (yet) a feature of svn.  You might be able to hack 
something together with a combination of svnadmin dump/restore and 
rsync, but you're going to be living dangerously.  Especially if you 
intend to have them all 'active' simultaneously.

Someday, I hope to see that svn has a 'clone' similar to BitKeeper.
For now, I just keep dreaming of what that might be like :)
-- 

Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853  E808 BB07 9239 53F1 28EE

	It may look like I'm just sitting here doing nothing,
   but I'm really actively waiting for all my problems to go away.

	 If you're not having fun, you're not doing it right!



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by "Jamin W. Collins" <jc...@asgardsrealm.net>.
On Mon, Jun 09, 2003 at 03:23:43PM -0500, cmpilato@collab.net wrote:
> Faheem Mitha <fa...@email.unc.edu> writes:
> 
> > Now, it could be that by bad luck the repository is changing at the same
> > time as the backup is performed, and that could be all of the problem. I'm
> > not actually sure when the repository changes. I thought earlier it would
> > only change during commits (which I don't do that often), but as I wrote
> > earlier, this does not seem to be the case. Would anyone like to share
> > with me under what circumstances the repository would change (and why)
> > aside for commits?
> 
> We use Berkeley's transaction and log system for everything, and we
> use Subversion transactions as part of 'svn update'.  So even updates
> (and therefore checkouts, status -u, diff, switch ...) are writing to
> the database.

This may be a little off topic for this thread, but how would one go
about maintaining several writable copies of the same SVN repository,
keeping them all in sync with each other?  Is it even possible?

-- 
Jamin W. Collins

Remember, root always has a loaded gun.  Don't run around with it unless
you absolutely need it. -- Vineet Kumar

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by Michael Wood <mw...@its.uct.ac.za>.
On Mon, Jun 09, 2003 at 06:00:41PM -0400, Faheem Mitha wrote:
[snip]
> Hang on a second though. In my example (in my first message) I had
> completed the checkout on the remote repository before I ran the rsync
> backup script again. So if there were temporary files/dirs created, why
> were they not deleted and the repository returned to its previous state?
> Or is this simply not guaranteed for some reason?

The "repository" is the same, but the BDB log files keep a log of the
"copy this stuff" and "delete it again" transaction.  So, I think the
strings table etc. should be the same, but the log files will definitely
NOT be the same.  (i.e. they will be bigger.)

-- 
Michael Wood <mw...@its.uct.ac.za>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by Bruce Elrick <br...@entropyreduction.ca>.
You should also be aware that any database system will manage it's 
allocated physical space as it sees fit and it is not surprising that 
the operating system files that provide the physical storage change when 
any action against the database occurs.

Creating temp tables (or even inserting temporary rows in a fixed table) 
then deleting the table (or rows in a fixed table) may cause blocks to 
be updated when the tables are created (rows inserted) and then updated 
again when the tables are deleted (rows deleted) such that the resultant 
blocks are not the same as they originally were.  For example, the DB 
design might be that every time a block is updated a sequentially 
increasing stamp is updated in the block and that stamp is tied to its 
logs (log sequence number?).

This meta-data the database uses to manage the physical space is pretty 
well guaranteed to change even if the two logical operations are 
expected to bring the DB back to its original logical state (rows 
inserted then deleted, or table added then dropped).

This is even more the case when you have a server-based RDBMS.  BDB is a 
library-based one where the application uses linked library calls which 
managed the DB (in other words the application processes access the DB 
files directly).  A DB like Oracle or DB2 uses a standalone DB server 
process (actually a set of cooperating processes) that manage the 
physical store and application access that via network connections to 
the RDBMS server processes.  It is possible that the RDBMS server 
processes are updating meta-data in the physical files even when no 
application processes are connected to the RDBMS server process (think 
of garbage collection -like operations).

Using something like rsync (or even cp) on the physical files of a DB 
when the DB is "up" or "active" will usually require the DB recover 
process to be run because of those metadata issues.  If the DB is down 
(in BDB's case that would be when there are no application processes 
with the DB files open, in Oracle's case that would be when the server 
engine is stopped) then a file copy will give you a clean copy of the 
DB, but as you've observed, there will be meta-data differences that 
rsync will have to transmit even when the logical state is the same.

Cheers...
Bruce


Faheem Mitha wrote:
> 
> On 9 Jun 2003, Ben Collins-Sussman wrote:
> 
> 
>>If you want more detail than this, you need to read the BDB docs at
>>sleepycat.com.  :-)
> 
> 
> That is a very complete description. Thanks for taking the trouble to
> reply in detail. I suppose the database must log every action before it
> performs it. I did not consider the possibility of a journalling
> filesystem style setup, though it makes perfect sense now that I think
> about it, to protect against crashes etc. So the differences in the
> repository I saw were presumably due to the changed logs.
> 
> Also, thanks to cmpilato for taking the trouble to reply to my earlier
> questions. Sorry to trouble you guys. :-)
> 
>                                                          Faheem.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by cm...@collab.net.
Faheem Mitha <fa...@email.unc.edu> writes:

> On 9 Jun 2003, Ben Collins-Sussman wrote:
> 
> > If you want more detail than this, you need to read the BDB docs at
> > sleepycat.com.  :-)
> 
> That is a very complete description. Thanks for taking the trouble to
> reply in detail. I suppose the database must log every action before it
> performs it. I did not consider the possibility of a journalling
> filesystem style setup, though it makes perfect sense now that I think
> about it, to protect against crashes etc. So the differences in the
> repository I saw were presumably due to the changed logs.
> 
> Also, thanks to cmpilato for taking the trouble to reply to my earlier
> questions. Sorry to trouble you guys. :-)

Nono, it was no trouble at all.  And honestly, I didn't think you were
complaining.  You just were noting something that seemed odd to you,
in conjuction with a specific problem (not being able to backup your
repos).  I could help with the problem, but needed to defer on the
oddity for lack of time to invest on my part.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by Faheem Mitha <fa...@email.unc.edu>.

On 9 Jun 2003, Ben Collins-Sussman wrote:

> If you want more detail than this, you need to read the BDB docs at
> sleepycat.com.  :-)

That is a very complete description. Thanks for taking the trouble to
reply in detail. I suppose the database must log every action before it
performs it. I did not consider the possibility of a journalling
filesystem style setup, though it makes perfect sense now that I think
about it, to protect against crashes etc. So the differences in the
repository I saw were presumably due to the changed logs.

Also, thanks to cmpilato for taking the trouble to reply to my earlier
questions. Sorry to trouble you guys. :-)

                                                         Faheem.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by Ben Collins-Sussman <su...@collab.net>.
Faheem Mitha <fa...@email.unc.edu> writes:

> You misunderstand. This is not a complaint or request for help. I just
> wondered why, if the files created are temporary, that the repository
> appears to have changed even after the transaction (eg. `svn co ...') was
> completed. Clearly this is no problem as such.

Here's the detailed answer you're looking for.

Any time you write to berkeley db (BDB) tables, BDB writes information
into its private logs.  This is how BDB is able to implement low-level
transactions, roll them back, or restore the whole database to a
consistent state after a system crash.

So when you hear us talking about 'repository logs', it has absolutely
nothing to do with the 'svn log' command or commit logs.  We're
talking about BDB's internal logging facility.

If you administer an svn repository, you need to prune the BDB logs
from time-to-time using the 'db_archive' utility; otherwise they grow
forever.  If you never delete the BDB logs, then in theory you could
replay every change that has *ever* happened to the BDB tables from
the very beginning of time, but most people don't want or need that.
Normally you just want enough log files lying around so that you can
restore the database to the "last known good" state.

Here are two more implications:

1. As cmpilato said earlier:  running 'svn up' creates a temporary
   tree in the repository, one which mirrors the working copy.  After
   the temporary tree is compared to the HEAD tree, the temporary tree
   is deleted.  So even though 'svn up' is a "read" operation from a
   user's perspective, it still involve writes to the BDB tables.
   That means you could create a repository, and if people never do
   anything but run 'svn co' or 'svn up', the BDB logfiles *will*
   still grow without bound, though very slowly.

2. To back up a BDB "environment" (directory containing BDB tables and
   logs) while the repository is "online" or "live" (being accessed),
   you need to follow a specific procedure (which is in the BDB
   documentation).  First, copy the entire directory elsewhere.  Then,
   go back and re-copy all the logfiles, because they may have been
   *changed* during the intial copy.  Then run 'db_recover' on the copy
   to make sure the logged actions are synchronized with the tables.
   This is what we mean by a "hot backup", and this is what our
   hot_backup.py script does.  If you run 'rsync' directly on a live
   repository, it's not going to follow step 2, and thus you're
   running into the problems you originally mentioned.  Better to
   rsync the hot-backed-up copy instead.

If you want more detail than this, you need to read the BDB docs at
sleepycat.com.  :-)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: backing up a repository over a network

Posted by "D.J. Heap" <dj...@dhiprovo.com>.
I don't think the temp tree is created in temporary files -- it uses
space within the repository database.  So it has not changed *logically*
after the temp area is discarded.  Physically (on disk) it has changed,
though not much for read operations, usually.

Of course, this is just my understanding from some code and docs perusal
and watching the database as I perform operations.

DJ

-----Original Message-----
From: Faheem Mitha [mailto:faheem@email.unc.edu] 
Sent: Monday, June 09, 2003 5:19 PM
To: SVN Dev List
Subject: Re: backing up a repository over a network

On 9 Jun 2003 cmpilato@collab.net wrote:

> Without seeing exactly what you did and in what order you did those
> things, I'm not able to really poke a guess at what ails you.  All I
can
> say is that the folks at Sleepycat recommend a particular algorithm
for
> making hot backups of Berkeley DB environments which is guaranteed to
> not screw stuff up, and it is that algorithm which hot-backup.py
> follows.

You misunderstand. This is not a complaint or request for help. I just
wondered why, if the files created are temporary, that the repository
appears to have changed even after the transaction (eg. `svn co ...')
was
completed. Clearly this is no problem as such.

> If we can agree on two things,
>
>    (1) that the hot-backup.py algorithm is the only one which
>        guarantees a usable backup repository, and
>    (2) that rsync will faithfully copy byte-for-byte filesystem
>        segments across a network,
>
> then I think we have a recipe for scratching your itch.  To use
> another non-Sleepycat-sanctioned backup method on a live repository
> takes us into the realm of academic discussion -- not my favorite
> flavor. :-)

Ok. I quite understand if you don't want to have an academic discussion.
I've guess I've spent too long in universities. :-)

                                                          Faheem.




**********************************************************************
This email and any files transmitted with it are confidential
and intended solely for the use of the individual or entity to
whom they are addressed. If you have received this email
in error please notify the system manager.

This footnote also confirms that this email message has been
swept by MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by Faheem Mitha <fa...@email.unc.edu>.

On 9 Jun 2003 cmpilato@collab.net wrote:

> Without seeing exactly what you did and in what order you did those
> things, I'm not able to really poke a guess at what ails you.  All I can
> say is that the folks at Sleepycat recommend a particular algorithm for
> making hot backups of Berkeley DB environments which is guaranteed to
> not screw stuff up, and it is that algorithm which hot-backup.py
> follows.

You misunderstand. This is not a complaint or request for help. I just
wondered why, if the files created are temporary, that the repository
appears to have changed even after the transaction (eg. `svn co ...') was
completed. Clearly this is no problem as such.

> If we can agree on two things,
>
>    (1) that the hot-backup.py algorithm is the only one which
>        guarantees a usable backup repository, and
>    (2) that rsync will faithfully copy byte-for-byte filesystem
>        segments across a network,
>
> then I think we have a recipe for scratching your itch.  To use
> another non-Sleepycat-sanctioned backup method on a live repository
> takes us into the realm of academic discussion -- not my favorite
> flavor. :-)

Ok. I quite understand if you don't want to have an academic discussion.
I've guess I've spent too long in universities. :-)

                                                          Faheem.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by cm...@collab.net.
Faheem Mitha <fa...@email.unc.edu> writes:

> On 9 Jun 2003 cmpilato@collab.net wrote:
> 
> > Understand that it's not logging the checkout or update, per se.  The
> > repository database contains trees of files and dirs ("revision
> > trees").  When you do an update, we make another temporary tree, which
> > we store in the database, that has the same files as dirs as your
> > working copy ("transaction tree").  We then diff your transaction tree
> > against the revision tree you're updating to, and that's how the
> > repository knows what you need as part of your update.  Finally, we
> > delete the temporary tree.
> >
> > It's only the creation and removal of that temporary tree that is
> > loggy in the database, and generally, we're not talking about a whole
> > lot of write operations.  It's proportional to the amount of revision
> > variation in your working copy (like, how many paths are at different
> > revisions than their parents), not to the size of the update.
> 
> Hang on a second though. In my example (in my first message) I had
> completed the checkout on the remote repository before I ran the rsync
> backup script again. So if there were temporary files/dirs created, why
> were they not deleted and the repository returned to its previous state?
> Or is this simply not guaranteed for some reason?

Without seeing exactly what you did and in what order you did those
things, I'm not able to really poke a guess at what ails you.  All I
can say is that the folks at Sleepycat recommend a particular
algorithm for making hot backups of Berkeley DB environments which is
guaranteed to not screw stuff up, and it is that algorithm which
hot-backup.py follows.

If we can agree on two things,

   (1) that the hot-backup.py algorithm is the only one which
       guarantees a usable backup repository, and
   (2) that rsync will faithfully copy byte-for-byte filesystem
       segments across a network,

then I think we have a recipe for scratching your itch.  To use
another non-Sleepycat-sanctioned backup method on a live repository
takes us into the realm of academic discussion -- not my favorite
flavor. :-)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by Faheem Mitha <fa...@email.unc.edu>.

On 9 Jun 2003 cmpilato@collab.net wrote:

> Understand that it's not logging the checkout or update, per se.  The
> repository database contains trees of files and dirs ("revision
> trees").  When you do an update, we make another temporary tree, which
> we store in the database, that has the same files as dirs as your
> working copy ("transaction tree").  We then diff your transaction tree
> against the revision tree you're updating to, and that's how the
> repository knows what you need as part of your update.  Finally, we
> delete the temporary tree.
>
> It's only the creation and removal of that temporary tree that is
> loggy in the database, and generally, we're not talking about a whole
> lot of write operations.  It's proportional to the amount of revision
> variation in your working copy (like, how many paths are at different
> revisions than their parents), not to the size of the update.

Hang on a second though. In my example (in my first message) I had
completed the checkout on the remote repository before I ran the rsync
backup script again. So if there were temporary files/dirs created, why
were they not deleted and the repository returned to its previous state?
Or is this simply not guaranteed for some reason?

I'll experiment some more with `svn up' and `svn st -u'.
                                                           Faheem.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by Faheem Mitha <fa...@email.unc.edu>.

On 9 Jun 2003 cmpilato@collab.net wrote:

> Faheem Mitha <fa...@email.unc.edu> writes:
>
> > Good heavens. If every checkout etc. is logged with the database, then it
> > must grow enormous over time. And I run `svn st -u' and `svn up' all the
> > time. If these change the repository, then that might account for the
> > problems I've been having. I don't see why it is necessary to log all this
> > activity, though.
> >
> > I see the Subversion Book refers to logs, but I thought they were logs of
> > commits. I didn't realise it was logging all this other stuff.
>
> Understand that it's not logging the checkout or update, per se.  The
> repository database contains trees of files and dirs ("revision
> trees").  When you do an update, we make another temporary tree, which
> we store in the database, that has the same files as dirs as your
> working copy ("transaction tree").  We then diff your transaction tree
> against the revision tree you're updating to, and that's how the
> repository knows what you need as part of your update.  Finally, we
> delete the temporary tree.
>
> It's only the creation and removal of that temporary tree that is
> loggy in the database, and generally, we're not talking about a whole
> lot of write operations.  It's proportional to the amount of revision
> variation in your working copy (like, how many paths are at different
> revisions than their parents), not to the size of the update.

Oh. I see. So nothing of the `svn up' operation is permanently stored?
That is a relief. I never thought about how `svn up' would be
accomplished, but I guess it would have to proceed as you say.

I was reading about the bubble-up method (in the design document)
yesterday, but I think that is about the actual commit. I'll see if it has
something about `svn up' etc.

                                                    Faheem.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by cm...@collab.net.
Faheem Mitha <fa...@email.unc.edu> writes:

> Good heavens. If every checkout etc. is logged with the database, then it
> must grow enormous over time. And I run `svn st -u' and `svn up' all the
> time. If these change the repository, then that might account for the
> problems I've been having. I don't see why it is necessary to log all this
> activity, though.
> 
> I see the Subversion Book refers to logs, but I thought they were logs of
> commits. I didn't realise it was logging all this other stuff.

Understand that it's not logging the checkout or update, per se.  The
repository database contains trees of files and dirs ("revision
trees").  When you do an update, we make another temporary tree, which
we store in the database, that has the same files as dirs as your
working copy ("transaction tree").  We then diff your transaction tree
against the revision tree you're updating to, and that's how the
repository knows what you need as part of your update.  Finally, we
delete the temporary tree.  

It's only the creation and removal of that temporary tree that is
loggy in the database, and generally, we're not talking about a whole
lot of write operations.  It's proportional to the amount of revision
variation in your working copy (like, how many paths are at different
revisions than their parents), not to the size of the update.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by Faheem Mitha <fa...@email.unc.edu>.

On 9 Jun 2003 cmpilato@collab.net wrote:

> Faheem Mitha <fa...@email.unc.edu> writes:
>
> > Now, it could be that by bad luck the repository is changing at the same
> > time as the backup is performed, and that could be all of the problem. I'm
> > not actually sure when the repository changes. I thought earlier it would
> > only change during commits (which I don't do that often), but as I wrote
> > earlier, this does not seem to be the case. Would anyone like to share
> > with me under what circumstances the repository would change (and why)
> > aside for commits?
>
> We use Berkeley's transaction and log system for everything, and we
> use Subversion transactions as part of 'svn update'.  So even updates
> (and therefore checkouts, status -u, diff, switch ...) are writing to
> the database.

Good heavens. If every checkout etc. is logged with the database, then it
must grow enormous over time. And I run `svn st -u' and `svn up' all the
time. If these change the repository, then that might account for the
problems I've been having. I don't see why it is necessary to log all this
activity, though.

I see the Subversion Book refers to logs, but I thought they were logs of
commits. I didn't realise it was logging all this other stuff.

                                                          Faheem.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by cm...@collab.net.
Faheem Mitha <fa...@email.unc.edu> writes:

> Now, it could be that by bad luck the repository is changing at the same
> time as the backup is performed, and that could be all of the problem. I'm
> not actually sure when the repository changes. I thought earlier it would
> only change during commits (which I don't do that often), but as I wrote
> earlier, this does not seem to be the case. Would anyone like to share
> with me under what circumstances the repository would change (and why)
> aside for commits?

We use Berkeley's transaction and log system for everything, and we
use Subversion transactions as part of 'svn update'.  So even updates
(and therefore checkouts, status -u, diff, switch ...) are writing to
the database.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by Faheem Mitha <fa...@email.unc.edu>.

On 9 Jun 2003 cmpilato@collab.net wrote:

> Faheem Mitha <fa...@email.unc.edu> writes:

> > But the problem that rsync is svn unaware is still the same.
>
> I don't really understand the "svn unawareness" issue.  Files are
> files, right?  Binary data.  Ones and zeros.  Now, granted, if you
> move a Berkeley DB environment between some platforms, things'll be
> all busted up.  That's why Berkeley provides its own db_dump and
> db_load utilities, to take care of endianness issues between
> platforms.
>
> But is there some other particular problem haunting you?

Well, in a nutshell, rsync *should* make a perfect copy of the repository,
which should Just Work. Sometimes, it doesn't. So I'm wondering why.

Now, it could be that by bad luck the repository is changing at the same
time as the backup is performed, and that could be all of the problem. I'm
not actually sure when the repository changes. I thought earlier it would
only change during commits (which I don't do that often), but as I wrote
earlier, this does not seem to be the case. Would anyone like to share
with me under what circumstances the repository would change (and why)
aside for commits?

Anyway, if that is the only problem, then what you suggest would solve it,
and I'll give it a try.

Oh, by the way, I'm copying from one Debian system to another, which are
identical in all significant respects from svn's point of view (yay
Debian) so platform differences should not be an issue.

> > Would it be very difficult to modify hot-backup.py to work over a
> > network?
>
> I dunno.  Care to give it a shot? :-)

Don't know. It looks difficult, and I suck at programming.

                                                          Faheem.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by cm...@collab.net.
Faheem Mitha <fa...@email.unc.edu> writes:

> On 9 Jun 2003 cmpilato@collab.net wrote:
> 
> > Faheem Mitha <fa...@email.unc.edu> writes:
> >
> > > In any case, I noticed there is a subversion script (hot-backup.py) which
> > > does a repository backup that is intelligent, so less likely to give a
> > > broken result. However, it seems designed for local use, and not for
> > > networking.
> >
> > Considered using a combination of hot-backup.py and rsync?  Make a hot
> > backup locally, and then rsync it to another machine.
> 
> I don't follow. I already have a repository locally. As I understand it
> (correct me if I am wrong) hot backup just clones my repository locally.
> So what is the advantage over directly rsyncing my (original)
> repository?

Well ... oh.  You already figured it out. :-)

> Oh, I suppose that protects against the possibility that I am writing to
> the repository at the same time. So it would remove one possible source of
> error. Ie.

Yep, that's it.

> Repository -> Local Backup (using hot-backup.py) -> Remote (using rsync)
> 
> Is this what you meant?

Exactly.

> But the problem that rsync is svn unaware is still the same. 

I don't really understand the "svn unawareness" issue.  Files are
files, right?  Binary data.  Ones and zeros.  Now, granted, if you
move a Berkeley DB environment between some platforms, things'll be
all busted up.  That's why Berkeley provides its own db_dump and
db_load utilities, to take care of endianness issues between
platforms.

But is there some other particular problem haunting you?  

> Would it be very difficult to modify hot-backup.py to work over a
> network?

I dunno.  Care to give it a shot? :-)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by Faheem Mitha <fa...@email.unc.edu>.

On 9 Jun 2003 cmpilato@collab.net wrote:

> Faheem Mitha <fa...@email.unc.edu> writes:
>
> > In any case, I noticed there is a subversion script (hot-backup.py) which
> > does a repository backup that is intelligent, so less likely to give a
> > broken result. However, it seems designed for local use, and not for
> > networking.
>
> Considered using a combination of hot-backup.py and rsync?  Make a hot
> backup locally, and then rsync it to another machine.

I don't follow. I already have a repository locally. As I understand it
(correct me if I am wrong) hot backup just clones my repository locally.
So what is the advantage over directly rsyncing my (original) repository?

Oh, I suppose that protects against the possibility that I am writing to
the repository at the same time. So it would remove one possible source of
error. Ie.

Repository -> Local Backup (using hot-backup.py) -> Remote (using rsync)

Is this what you meant?

But the problem that rsync is svn unaware is still the same. Would it
be very difficult to modify hot-backup.py to work over a network?

                                                        Faheem.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: backing up a repository over a network

Posted by cm...@collab.net.
Faheem Mitha <fa...@email.unc.edu> writes:

> In any case, I noticed there is a subversion script (hot-backup.py) which
> does a repository backup that is intelligent, so less likely to give a
> broken result. However, it seems designed for local use, and not for
> networking.

Considered using a combination of hot-backup.py and rsync?  Make a hot
backup locally, and then rsync it to another machine.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org