You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Phil Endecott <sp...@chezphil.org> on 2005/11/09 20:05:25 UTC

Update from post-commit hook

Dear All,

I have been thinking about using a post-commit hook to keep a public 
"latest snapshot" of part of the repository up to date.  This could be 
as simple as putting

	cd /latest/snapshot/; svn update

in the post-commit hook.  I've done some quick experiments which have 
not been very sucessful, so I have some questions:

- Where do error messages from the hook scripts go?  (Using Apache.)

- Access is normally via Apache; is the nested call to svn OK, or does 
/latest/snapshot/ need to be a file: checkout, or what?  (One of my 
experiments led to a runaway svn process, making me think that something 
recursive was going on.)

- (Possibly related to the above:) I rquire HTTP AUTH for both read and 
write to the repository.  How can the apache user, who runs the hook 
script, authenticate itself in the nested call?

- Only part of the repository is being tracked in this snapshot.  I 
could make the update conditional by checking if it has changed using 
svnlook, maybe something like: "snvlook changed | grep -q something || 
svn update".  But maybe an svn update when nothing has changed is just 
as fast - any comments?

- I don't want this to slow down commits if I can help it.  Is it OK to 
background the hook script, i.e. to have "svn update &" in the 
post-commit file?

Any suggestions much appreciated.

--Phil.





---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Update from post-commit hook

Posted by Dominic Anello <da...@danky.com>.
On 2005-11-15 08:05:55 -0700, Mark Parker wrote:
> Ryan Schmidt wrote:
> >It should be ok, but I would recommend using a file:/// checkout  
> >instead because it will be faster. Make sure you have a FSFS  
> >repository, as BDB repositories have problems with concurrent access  
> >over different protocols.
> 
> No, BDB doesn't have problems with multiple access methods (or at least 
> any problems that FSFS doesn't have). You run into problems when process 
> A creates files in the repository (usually logs for BDB) that are 
> unreadable/unwriteable for process B. This is (as far as I understand) 
> MORE likely to happen with FSFS, because EVERY COMMIT is GUARANTEED to 
> create one or more new files in the repository, while BDB logfiles are 
> only created when the last one fills up.
> 
> I've taken my information from the book 
> (http://svnbook.red-bean.com/nightly/en/svn.serverconfig.multimethod.html), 
> and if I've misinterpreted something here, please feel free to correct me.
> 
> Mark

BDB will create and log a transaction for reads as well as writes, so
it's possible for a "read only" operation such as svnlook or svnadmin
verify to generate a new log file.  I've personally wedged my repository
by carelessly running verify as root, for example.

You can try it yourself:
[svn@lynx ~/ec-svn/repo/db]$ ls -l log.*
-rw-rw-r--    1 svn      svn       1046553 Nov 14 15:17 log.0000002124
-rw-rw-r--    1 svn      svn        941420 Nov 15 13:47 log.0000002125
[svn@lynx ~/ec-svn/repo/db]$ svnlook history .. /trunk > /dev/null
[svn@lynx ~/ec-svn/repo/db]$ ls -l log.*
-rw-rw-r--    1 svn      svn       1046553 Nov 14 15:17 log.0000002124
-rw-rw-r--    1 svn      svn        954652 Nov 15 13:56 log.0000002125

Note that log.0000002125 grew by about 13k just from a simple list.  A
verify on a large repo can easily roll the log over many times.

It's hard to say which one is more prone to wedging due to permission
issues.  Barring performance problems with one or the other, I  think it
basically comes down to whatever one you are most comfortable with.

-Dominic


Re: Update from post-commit hook

Posted by Mark Parker <ma...@msdhub.com>.
Ryan Schmidt wrote:
> It should be ok, but I would recommend using a file:/// checkout  
> instead because it will be faster. Make sure you have a FSFS  
> repository, as BDB repositories have problems with concurrent access  
> over different protocols.

No, BDB doesn't have problems with multiple access methods (or at least 
any problems that FSFS doesn't have). You run into problems when process 
A creates files in the repository (usually logs for BDB) that are 
unreadable/unwriteable for process B. This is (as far as I understand) 
MORE likely to happen with FSFS, because EVERY COMMIT is GUARANTEED to 
create one or more new files in the repository, while BDB logfiles are 
only created when the last one fills up.

I've taken my information from the book 
(http://svnbook.red-bean.com/nightly/en/svn.serverconfig.multimethod.html), 
and if I've misinterpreted something here, please feel free to correct me.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Update from post-commit hook

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Nov 9, 2005, at 22:28, Phil Endecott wrote:

>> I would definitely recommend spawning off another process  
>> somehow,  but simply "svn update &" may be too simplistic. If Bob  
>> commits r100  at 12:00:00 and the svn update process gets spawned  
>> at 12:00:01 and  takes 10 seconds to complete, and Joe commits  
>> r101 at 12:00:02 firing  off another svn update process at  
>> 12:00:03, then Joe's update process  will probably throw an error  
>> that the working copy is in use /  dirty / whatever it says
>
> If the update instead waited for the working copy to be unlocked,  
> rather than failing immediately, there wouldn't be a problem.

The exact message printed if you try to update a working copy twice  
simultaneously is this:

svn: Working copy '.' locked
svn: run 'svn cleanup' to remove locks (type 'svn help cleanup' for  
details)

The point is that a human must decide if it is appropriate to run svn  
cleanup. It is not appropriate if there is another task updating the  
working copy. It is appropriate if there merely was another such task  
and it has crashed or terminated improperly.

If, as you suggest, svn update simply waited for the working copy to  
be unlocked, it would wait all day if the previous update in fact had  
a problem.


> Is this an issue only when the hook script is backgrounded, or does  
> it also appy in the normal case?

I do not know how Subversion handles the timing of the situation  
where two people try to commit different changes at about the same  
time. I have a hunch that Subversion handles only one commit at a  
time, so the problem would not exist if you did not background the task.


> Is there a lock somewhere that is not released until the post- 
> commit hook has finished?  I had assumed that the locks were  
> released at the point of commit.

I'm talking about locks relating to the working copy, not the  
repository. Whenever you say svn update (and probably other svn  
commands), the working copy is locked to prevent another process from  
trying to work on the same working copy. When the first task is done,  
it's supposed to unlock the working copy again.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Update from post-commit hook

Posted by Phil Endecott <sp...@chezphil.org>.
Ryan Schmidt wrote:
 > Phil Endecott wrote:
>> - Where do error messages from the hook scripts go?  (Using Apache.)
> ...error messages go nowhere.

Not a great decision, I must say.  How hard would it be to have them 
appear in Apache's error log?

>> - I don't want this to slow down commits if I can help it.  Is it  OK 
>> to background the hook script, i.e. to have "svn update &" in  the 
>> post-commit file?
> 
> I would definitely recommend spawning off another process somehow,  but 
> simply "svn update &" may be too simplistic. If Bob commits r100  at 
> 12:00:00 and the svn update process gets spawned at 12:00:01 and  takes 
> 10 seconds to complete, and Joe commits r101 at 12:00:02 firing  off 
> another svn update process at 12:00:03, then Joe's update process  will 
> probably throw an error that the working copy is in use /  dirty / 
> whatever it says

If the update instead waited for the working copy to be unlocked, rather 
than failing immediately, there wouldn't be a problem.  Is this an issue 
only when the hook script is backgrounded, or does it also appy in the 
normal case?  Is there a lock somewhere that is not released until the 
post-commit hook has finished?  I had assumed that the locks were 
released at the point of commit.

Cheers,

--Phil.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Update from post-commit hook

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Nov 9, 2005, at 21:05, Phil Endecott wrote:

> Dear All,
>
> I have been thinking about using a post-commit hook to keep a  
> public "latest snapshot" of part of the repository up to date.   
> This could be as simple as putting
>
> 	cd /latest/snapshot/; svn update

The hook script runs with no environment. You'll need to call any  
executables by their complete path, e.g. /usr/bin/svn. You can test  
your hook script on the command line by running something like env - 
i /path/to/hooks/post-commit . This is useful for debugging because...

> in the post-commit hook.  I've done some quick experiments which  
> have not been very sucessful, so I have some questions:
>
> - Where do error messages from the hook scripts go?  (Using Apache.)

...error messages go nowhere. You can redirect them to a log file if  
you so desire, like

cd /latest/snapshot
/usr/bin/svn update 2>/path/to/svnupdate.log


> - Access is normally via Apache; is the nested call to svn OK, or  
> does /latest/snapshot/ need to be a file: checkout, or what?  (One  
> of my experiments led to a runaway svn process, making me think  
> that something recursive was going on.)

It should be ok, but I would recommend using a file:/// checkout  
instead because it will be faster. Make sure you have a FSFS  
repository, as BDB repositories have problems with concurrent access  
over different protocols.


> - (Possibly related to the above:) I rquire HTTP AUTH for both read  
> and write to the repository.  How can the apache user, who runs the  
> hook script, authenticate itself in the nested call?

You would need to figure out where the Apache user's home is, set up  
a .subversion directory with cached authentication credentials  
inside. It'll be easier if you just use a file:/// checkout and  
bypass all that.


> - Only part of the repository is being tracked in this snapshot.  I  
> could make the update conditional by checking if it has changed  
> using svnlook, maybe something like: "snvlook changed | grep -q  
> something || svn update".  But maybe an svn update when nothing has  
> changed is just as fast - any comments?

In the script I wrote I first got a list of all paths changed by the  
commit, then I had a mapping of snapshot working copies to repository  
paths, and updated only the relevant working copies. I'm not sure if  
this ended up being faster or slower than just updating everything  
all the time. The problem in our setup was (and continues to be) the  
RAM cache, which keeps getting thrown out and used for other things.  
In testing, everything is very quick because data about the working  
copy files remains in the server's cache. As soon as it's in  
production, though, and our 10 developers are using Apache and  
sending mails and everything else, the cache thrashes about and it  
takes forever to do things.


> - I don't want this to slow down commits if I can help it.  Is it  
> OK to background the hook script, i.e. to have "svn update &" in  
> the post-commit file?

I would definitely recommend spawning off another process somehow,  
but simply "svn update &" may be too simplistic. If Bob commits r100  
at 12:00:00 and the svn update process gets spawned at 12:00:01 and  
takes 10 seconds to complete, and Joe commits r101 at 12:00:02 firing  
off another svn update process at 12:00:03, then Joe's update process  
will probably throw an error that the working copy is in use /  
dirty / whatever it says, and when all's said and done, by 12:00:15,  
the snapshot will be up to date with r100 but will never update to  
r101—not until Sally commits r102 at 15:34:00.

While I have thought of these issues, I have not yet programmed the  
correct solution to them; if someone else has gone that extra mile,  
I'd sure like to hear what the best solution is.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org