You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Garrett Rooney <ro...@electricjellyfish.net> on 2006/02/22 20:55:14 UTC

Ways to keep users from checking out too much.

In my copious spare time I help to administer http://svn.apache.org/,
a rather large and busy subversion server.

Over the past few weeks, we've had at least two instances (that we
noticed) of users causing noticable performance problems by checking
out huge parts of the repository.  In one of those cases it was on
purpose (a search engine trying to check out and index the repository)
and in another it's quite possible it was an accident.  In either case
though, we ended up blocking the user's IP via ipfw rules to make the
problem go away.  This is a rather manpower intensive solution to the
problem, and I'd love it if there was a way to prevent the problem
from occurring.

How would people feel about some mechanism for stopping update reports
rooted at particular directories?  It might at least prevent the
accidental foot shooting you get when an inexperienced user first
tries their hand at a svn checkout, and that would be a nice step in
the right direction.

Any thoughts?

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Ways to keep users from checking out too much.

Posted by Chia-liang Kao <cl...@clkao.org>.
Garrett Rooney <rooneg <at> electricjellyfish.net> writes:
> > Agreed.  I'd like to ensure that replay can only be done by authorized
> > users/clients (if that's a hook, so be it).  If we allow replay on
> > some big public servers, it'd be a bad thing for those shared
> > resources if everyone can do it.  We can do a 'email us if you want to
> > be on the replay list' - this is what most RBLs do for rsync services,
> > etc.  Yes, they can also do a checkout; but this feature seems a bit
> > scary without a way to curb its capability.  I shudder to think what a
> > 380000-revision replay is going to look like.
> 
> If there's going to be such a hook, I'd just ask that it at least take
> into account things like the base directory of the replay, so that
> people can use replay to mirror particular branches, which seems safe,
> but be kept from mirroring entire projects or repositories, which
> would be dangerous.

I don't think it makes any sense for restricting a convenient api but not the
expensive one where the user can use to do essentially the same thing,
such as, what svn::mirror does, reverse engineering the log output and use
the fulltext delta to reconstruct a revision.

I guess a sensible default in the client to DTRT is more a priority than
server-side throttle.

Cheers,
CLK



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ways to keep users from checking out too much.

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/22/06, Justin Erenkrantz <ju...@erenkrantz.com> wrote:
> On 2/22/06, C. Michael Pilato <cm...@collab.net> wrote:
> > Um... aren't you the guy that just implemented the equivalent of
> > 'svnadmin dump' over the RA layer?  Does that not generate a similar
> > level of system strain?
>
> Agreed.  I'd like to ensure that replay can only be done by authorized
> users/clients (if that's a hook, so be it).  If we allow replay on
> some big public servers, it'd be a bad thing for those shared
> resources if everyone can do it.  We can do a 'email us if you want to
> be on the replay list' - this is what most RBLs do for rsync services,
> etc.  Yes, they can also do a checkout; but this feature seems a bit
> scary without a way to curb its capability.  I shudder to think what a
> 380000-revision replay is going to look like.

If there's going to be such a hook, I'd just ask that it at least take
into account things like the base directory of the replay, so that
people can use replay to mirror particular branches, which seems safe,
but be kept from mirroring entire projects or repositories, which
would be dangerous.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Ways to keep users from checking out too much.

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On 2/22/06, C. Michael Pilato <cm...@collab.net> wrote:
> Um... aren't you the guy that just implemented the equivalent of
> 'svnadmin dump' over the RA layer?  Does that not generate a similar
> level of system strain?

Agreed.  I'd like to ensure that replay can only be done by authorized
users/clients (if that's a hook, so be it).  If we allow replay on
some big public servers, it'd be a bad thing for those shared
resources if everyone can do it.  We can do a 'email us if you want to
be on the replay list' - this is what most RBLs do for rsync services,
etc.  Yes, they can also do a checkout; but this feature seems a bit
scary without a way to curb its capability.  I shudder to think what a
380000-revision replay is going to look like.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Ways to keep users from checking out too much.

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On 2/22/06, Alexander Kitaev <al...@tmate.org> wrote:
> and this of course could slow down everething. May it makes sense to
> dedicate a separate repository for each project - indexing software will
> only index separate respositories, users will not be able to checkout

For social reasons, the ASF strongly prefers the single repository
model.  This allows easy cross-pollination between our projects. 
Therefore, the ASF's response to any slowdowns is to make Subversion
faster for us - not to split the repositories.  ;-)  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Ways to keep users from checking out too much.

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/22/06, Alexander Kitaev <al...@tmate.org> wrote:
> Hello Garrett,
>
> Am I understand correctly that svn.apache.org is organized as a single
> repository that contains a lot of apache projects, each with its own tags
> and branches? I mean one at http://svn.apache.org/repos/asf/ that currently
> includes more then 379000 of revisions.
>
> If you're talking about that repository, from my point of view (I'm not
> Subversion developer), making a checkout from such repository (even not from
> the root) makes Subversion server go through all that thousands of revisions
> and this of course could slow down everething. May it makes sense to
> dedicate a separate repository for each project - indexing software will
> only index separate respositories, users will not be able to checkout
> everething, and in general performance will be better. Exuse me if I mixed
> up something...

The problem is that even checking out the root of a single project
(and thus all its tags, branches, etc) is still enough to cause a
noticable problem.  Splitting the repository wouldn't stop that sort
of problem.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


RE: Ways to keep users from checking out too much.

Posted by Greg Hudson <gh...@MIT.EDU>.
On Wed, 2006-02-22 at 23:31 +0100, Alexander Kitaev wrote:
> If you're talking about that repository, from my point of view (I'm not
> Subversion developer), making a checkout from such repository (even not from
> the root) makes Subversion server go through all that thousands of revisions
> and this of course could slow down everething.

Subversion's back end design is such that you don't have to "go through"
revision files which aren't relevant to the subtree you're checking out.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Ways to keep users from checking out too much.

Posted by Alexander Kitaev <al...@tmate.org>.
Hello Garrett,

Am I understand correctly that svn.apache.org is organized as a single
repository that contains a lot of apache projects, each with its own tags
and branches? I mean one at http://svn.apache.org/repos/asf/ that currently
includes more then 379000 of revisions. 

If you're talking about that repository, from my point of view (I'm not
Subversion developer), making a checkout from such repository (even not from
the root) makes Subversion server go through all that thousands of revisions
and this of course could slow down everething. May it makes sense to
dedicate a separate repository for each project - indexing software will
only index separate respositories, users will not be able to checkout
everething, and in general performance will be better. Exuse me if I mixed
up something...

Alexander Kitaev,
TMate Software,
http://tmate.org/
http://jetbrains.com/tmate/ 

> -----Original Message-----
> From: rooneg@gmail.com [mailto:rooneg@gmail.com] On Behalf Of 
> Garrett Rooney
> Sent: Wednesday, February 22, 2006 23:11
> To: C. Michael Pilato
> Cc: Jim Blandy; dev@subversion.tigris.org
> Subject: Re: Ways to keep users from checking out too much.
> 
> On 2/22/06, C. Michael Pilato <cm...@collab.net> wrote:
> > C. Michael Pilato wrote:
> > > Garrett Rooney wrote:
> > >
> > >
> > >>Honestly, I don't care one way or another if they can 
> --force it or 
> > >>not.  Checking out trees that large puts an unacceptable 
> amount of 
> > >>strain on a public resource in this case, I just want to 
> be able to 
> > >>stop them from making silly mistakes that require administrator 
> > >>effort to block, and reserve the admin effort for the cases where 
> > >>people are actually doing this kind of thing on purpose.  If they 
> > >>can add --force and make it work, that's nice, but it's 
> not really a showstopper IMO.
> > >
> > >
> > > Um... aren't you the guy that just implemented the equivalent of 
> > > 'svnadmin dump' over the RA layer?  Does that not 
> generate a similar 
> > > level of system strain?
> >
> > Sorry!  That's got a good chance of reading like, "You 
> can't complain 
> > because you're a troublemaker of the same flavor!"
> >
> > What I mean is simply, "There are probably lots of ways to 
> put strain 
> > on a Subversion server.  Which of them are you wanting to 
> block, and 
> > if not all of them, why not?"
> 
> I accept the fact that there's no way to keep a determined 
> user from DOSing the system, when you run a publicly 
> accessible server you just have to live with that and be able 
> to block that traffic at other levels if needed.  What I do 
> want to be able to do is make it sufficiently hard to do that 
> a user won't be likely to do it accidentally.
> 
> -garrett
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ways to keep users from checking out too much.

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/22/06, C. Michael Pilato <cm...@collab.net> wrote:
> C. Michael Pilato wrote:
> > Garrett Rooney wrote:
> >
> >
> >>Honestly, I don't care one way or another if they can --force it or
> >>not.  Checking out trees that large puts an unacceptable amount of
> >>strain on a public resource in this case, I just want to be able to
> >>stop them from making silly mistakes that require administrator effort
> >>to block, and reserve the admin effort for the cases where people are
> >>actually doing this kind of thing on purpose.  If they can add --force
> >>and make it work, that's nice, but it's not really a showstopper IMO.
> >
> >
> > Um... aren't you the guy that just implemented the equivalent of
> > 'svnadmin dump' over the RA layer?  Does that not generate a similar
> > level of system strain?
>
> Sorry!  That's got a good chance of reading like, "You can't complain
> because you're a troublemaker of the same flavor!"
>
> What I mean is simply, "There are probably lots of ways to put strain on
> a Subversion server.  Which of them are you wanting to block, and if not
> all of them, why not?"

I accept the fact that there's no way to keep a determined user from
DOSing the system, when you run a publicly accessible server you just
have to live with that and be able to block that traffic at other
levels if needed.  What I do want to be able to do is make it
sufficiently hard to do that a user won't be likely to do it
accidentally.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Ways to keep users from checking out too much.

Posted by "C. Michael Pilato" <cm...@collab.net>.
C. Michael Pilato wrote:
> Garrett Rooney wrote:
> 
> 
>>Honestly, I don't care one way or another if they can --force it or
>>not.  Checking out trees that large puts an unacceptable amount of
>>strain on a public resource in this case, I just want to be able to
>>stop them from making silly mistakes that require administrator effort
>>to block, and reserve the admin effort for the cases where people are
>>actually doing this kind of thing on purpose.  If they can add --force
>>and make it work, that's nice, but it's not really a showstopper IMO.
> 
> 
> Um... aren't you the guy that just implemented the equivalent of
> 'svnadmin dump' over the RA layer?  Does that not generate a similar
> level of system strain?

Sorry!  That's got a good chance of reading like, "You can't complain
because you're a troublemaker of the same flavor!"

What I mean is simply, "There are probably lots of ways to put strain on
a Subversion server.  Which of them are you wanting to block, and if not
all of them, why not?"


-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Re: Ways to keep users from checking out too much.

Posted by Paul Querna <ch...@force-elite.com>.
Brian Behlendorf wrote:
> On Wed, 22 Feb 2006, Paul Querna wrote:
>> Brian Behlendorf wrote:
>>> On Wed, 22 Feb 2006, Garrett Rooney wrote:
>>>> It might be possible to do some magic in apache modules to make that
>>>> work, but I'm not sure how off the top of my head.  Would certainly be
>>>> nice though.
>>>
>>> mod_throttle?  mod_bandwidth?
>>
>> These existing modules wouldn't actually fix much, since they are 
>> generally designed to limit the number of concurrent requests, and/or 
>> bandwidth used, neither of which is causing the load problem.
> 
> Agreed on concurrent requests, but bandwidth used may be relevant, since 
> responding to a big request is a combination of server CPU time, local 
> disk I/O, and network I/O.  If you rate limit network I/O to that 
> particular client, then you probably also effectively rate limit the 
> other two, since we're not pipelining at all yet, and even when we do 
> there'll be a limit to how deep the pipeline will go.
> 
>> One thing you can do, is drop the priority of the Subversion HTTPD 
>> Process, so it doesn't starve the machine for IO and CPU.
> 
> What about a multithreaded httpd?  Is there a cross-platform way to drop 
> a priority of a thread?

It doesn't work in a multithreaded httpd.

Most people however, better not be running svn in a multithreaded httpd, 
if they are using any hook scripts.  Its a mess, but thats another whole 
topic.

AFAIK, there isn't a cross platform way to drop priority of a thread.

-Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ways to keep users from checking out too much.

Posted by Brian Behlendorf <br...@collab.net>.
On Wed, 22 Feb 2006, Paul Querna wrote:
> Brian Behlendorf wrote:
>> On Wed, 22 Feb 2006, Garrett Rooney wrote:
>>> It might be possible to do some magic in apache modules to make that
>>> work, but I'm not sure how off the top of my head.  Would certainly be
>>> nice though.
>> 
>> mod_throttle?  mod_bandwidth?
>
> These existing modules wouldn't actually fix much, since they are generally 
> designed to limit the number of concurrent requests, and/or bandwidth used, 
> neither of which is causing the load problem.

Agreed on concurrent requests, but bandwidth used may be relevant, since 
responding to a big request is a combination of server CPU time, local 
disk I/O, and network I/O.  If you rate limit network I/O to that 
particular client, then you probably also effectively rate limit the other 
two, since we're not pipelining at all yet, and even when we do there'll 
be a limit to how deep the pipeline will go.

> One thing you can do, is drop the priority of the Subversion HTTPD Process, 
> so it doesn't starve the machine for IO and CPU.

What about a multithreaded httpd?  Is there a cross-platform way to drop a 
priority of a thread?

It seems like there are two responses to an "abusive request": to 
either reject it or to allow it to progress in a way that does not 
interfere with others.  In the first case, Garrett's idea to filter the 
REPORT mechanism might work, or it might simply be a question of how 
flexible we make ACLs and authz.  In the second case, this is a very 
familiar topic in operating systems design (how to keep one process from 
soaking up so much time that it "unfairly" robs other processes of the 
right to complete).  I'm tempted to say let's let the OS worry about the 
problem by using POSIX or OS-specific semantics where possible to rate 
limit or lower priority of the responding thread, making sure we don't 
exacerbate the problem in some way (like exclusive lock during read). 
Overall I'd say let the administrator decide which of those two approaches 
to take.

 	Brian


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ways to keep users from checking out too much.

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/22/06, Paul Querna <ch...@force-elite.com> wrote:
> Brian Behlendorf wrote:
> > On Wed, 22 Feb 2006, Garrett Rooney wrote:
> >> It might be possible to do some magic in apache modules to make that
> >> work, but I'm not sure how off the top of my head.  Would certainly be
> >> nice though.
> >
> > mod_throttle?  mod_bandwidth?
> >
> >     Brian
>
> These existing modules wouldn't actually fix much, since they are
> generally designed to limit the number of concurrent requests, and/or
> bandwidth used, neither of which is causing the load problem.
>
> One thing you can do, is drop the priority of the Subversion HTTPD
> Process, so it doesn't starve the machine for IO and CPU.
>
> So, I hacked up a 'mod_renice' tonight:
>   http://paul.querna.org/~chip/mod_renice.c
>
> Basic flow is to call setpriority() when we detect a REPORT method on
> any repo path matching a regular expression.  When the request ends, we
> restore our original priority.
>
> Its only been tested lightly on FreeBSD 6. YMMV.

I'm also investigating the possibility of using an Apache filter to
look at the contents of a REPORT request and send back errors for
particularly stupid requests.  This is, of course, a bit more complex,
what with the XML parsing and whatnot, but it's pretty similar to my
mod_speedyfeed module, so I may be able to make it work.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Ways to keep users from checking out too much.

Posted by Paul Querna <ch...@force-elite.com>.
Brian Behlendorf wrote:
> On Wed, 22 Feb 2006, Garrett Rooney wrote:
>> It might be possible to do some magic in apache modules to make that
>> work, but I'm not sure how off the top of my head.  Would certainly be
>> nice though.
> 
> mod_throttle?  mod_bandwidth?
> 
>     Brian

These existing modules wouldn't actually fix much, since they are 
generally designed to limit the number of concurrent requests, and/or 
bandwidth used, neither of which is causing the load problem.

One thing you can do, is drop the priority of the Subversion HTTPD 
Process, so it doesn't starve the machine for IO and CPU.

So, I hacked up a 'mod_renice' tonight:
  http://paul.querna.org/~chip/mod_renice.c

Basic flow is to call setpriority() when we detect a REPORT method on 
any repo path matching a regular expression.  When the request ends, we 
restore our original priority.

Its only been tested lightly on FreeBSD 6. YMMV.

-Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ways to keep users from checking out too much.

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/22/06, Brian Behlendorf <br...@collab.net> wrote:
> On Wed, 22 Feb 2006, Garrett Rooney wrote:
> > It might be possible to do some magic in apache modules to make that
> > work, but I'm not sure how off the top of my head.  Would certainly be
> > nice though.
>
> mod_throttle?  mod_bandwidth?

Certainly worth investigating, although for the accidental case I'd
prefer to be able to give the user a quick "don't do that!" message
instead of just making their request take a long time (during which
they're holding open a connection and taking some amount of the
server's resources).

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Ways to keep users from checking out too much.

Posted by Brian Behlendorf <br...@collab.net>.
On Wed, 22 Feb 2006, Garrett Rooney wrote:
> It might be possible to do some magic in apache modules to make that
> work, but I'm not sure how off the top of my head.  Would certainly be
> nice though.

mod_throttle?  mod_bandwidth?

 	Brian


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ways to keep users from checking out too much.

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/22/06, Arlie Davis <ad...@stonestreetone.com> wrote:
> Would it be possible, instead, to make long-running operations run at a
> lower priority than short-running operations?
>
> I'm a newcomer to SVN design, so this really is a question, and not a
> suggestion.

It might be possible to do some magic in apache modules to make that
work, but I'm not sure how off the top of my head.  Would certainly be
nice though.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


RE: Ways to keep users from checking out too much.

Posted by Arlie Davis <ad...@stonestreetone.com>.
Would it be possible, instead, to make long-running operations run at a
lower priority than short-running operations? 

I'm a newcomer to SVN design, so this really is a question, and not a
suggestion.

-- arlie


-----Original Message-----
From: rooneg@gmail.com [mailto:rooneg@gmail.com] On Behalf Of Garrett Rooney
Sent: Wednesday, February 22, 2006 5:02 PM
To: C. Michael Pilato
Cc: Jim Blandy; dev@subversion.tigris.org
Subject: Re: Ways to keep users from checking out too much.

On 2/22/06, C. Michael Pilato <cm...@collab.net> wrote:
> Garrett Rooney wrote:
>
> > Honestly, I don't care one way or another if they can --force it or 
> > not.  Checking out trees that large puts an unacceptable amount of 
> > strain on a public resource in this case, I just want to be able to 
> > stop them from making silly mistakes that require administrator 
> > effort to block, and reserve the admin effort for the cases where 
> > people are actually doing this kind of thing on purpose.  If they 
> > can add --force and make it work, that's nice, but it's not really a
showstopper IMO.
>
> Um... aren't you the guy that just implemented the equivalent of 
> 'svnadmin dump' over the RA layer?  Does that not generate a similar 
> level of system strain?

Sure, but that's actually less strain than the way people were doing the
same thing before I implemented that.  It's more effort to do the job via
diff than via replay ;-)

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ways to keep users from checking out too much.

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/22/06, C. Michael Pilato <cm...@collab.net> wrote:
> Garrett Rooney wrote:
>
> > Honestly, I don't care one way or another if they can --force it or
> > not.  Checking out trees that large puts an unacceptable amount of
> > strain on a public resource in this case, I just want to be able to
> > stop them from making silly mistakes that require administrator effort
> > to block, and reserve the admin effort for the cases where people are
> > actually doing this kind of thing on purpose.  If they can add --force
> > and make it work, that's nice, but it's not really a showstopper IMO.
>
> Um... aren't you the guy that just implemented the equivalent of
> 'svnadmin dump' over the RA layer?  Does that not generate a similar
> level of system strain?

Sure, but that's actually less strain than the way people were doing
the same thing before I implemented that.  It's more effort to do the
job via diff than via replay ;-)

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Ways to keep users from checking out too much.

Posted by "C. Michael Pilato" <cm...@collab.net>.
Garrett Rooney wrote:

> Honestly, I don't care one way or another if they can --force it or
> not.  Checking out trees that large puts an unacceptable amount of
> strain on a public resource in this case, I just want to be able to
> stop them from making silly mistakes that require administrator effort
> to block, and reserve the admin effort for the cases where people are
> actually doing this kind of thing on purpose.  If they can add --force
> and make it work, that's nice, but it's not really a showstopper IMO.

Um... aren't you the guy that just implemented the equivalent of
'svnadmin dump' over the RA layer?  Does that not generate a similar
level of system strain?

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Re: Ways to keep users from checking out too much.

Posted by Robert Spier <rs...@pobox.com>.
> Just to be clear, the search engine we had a problem with at
> svn.apache.org was actually doing a checkout of big chunks of the
> repos, not just crawling the http interface like you'd normally
> expect.  Normal http crawlers can be dealt with via robots.txt, so I
> don't see any reason to build in support for blocking that sort of
> thing in subversion itself.

This is similar to what svk can do.  While none of my (perl.org)
repositories are as big as the apache repository, someone deciding to
check out every revision of every path can definitely bring the server
and pipe to its knees.  

-R

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ways to keep users from checking out too much.

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/22/06, Jim Blandy <ji...@red-bean.com> wrote:
> On 2/22/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> > Honestly, I don't care one way or another if they can --force it or
> > not.  Checking out trees that large puts an unacceptable amount of
> > strain on a public resource in this case, I just want to be able to
> > stop them from making silly mistakes that require administrator effort
> > to block, and reserve the admin effort for the cases where people are
> > actually doing this kind of thing on purpose.  If they can add --force
> > and make it work, that's nice, but it's not really a showstopper IMO.
>
> It seems to me that there are several ways this problem can come about
> for which an administrator could plausibly want different policies.
>
> Wanting to prevent accidental checkouts of the root by users who don't
> understand the repository organization seems like something almost
> every Subversion installation would want to do.  If the protection was
> done in a way that wasn't too confusing to new users, I'd say it
> should be on by default.
>
> Wanting to prevent root checkouts by search engine robots seems more
> like a site-by-site decision.

Just to be clear, the search engine we had a problem with at
svn.apache.org was actually doing a checkout of big chunks of the
repos, not just crawling the http interface like you'd normally
expect.  Normal http crawlers can be dealt with via robots.txt, so I
don't see any reason to build in support for blocking that sort of
thing in subversion itself.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Ways to keep users from checking out too much.

Posted by Michael Sinz <Mi...@sinz.org>.
On 2/22/06, Jim Blandy <ji...@red-bean.com> wrote:
> On 2/22/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> > Honestly, I don't care one way or another if they can --force it or
> > not.  Checking out trees that large puts an unacceptable amount of
> > strain on a public resource in this case, I just want to be able to
> > stop them from making silly mistakes that require administrator effort
> > to block, and reserve the admin effort for the cases where people are
> > actually doing this kind of thing on purpose.  If they can add --force
> > and make it work, that's nice, but it's not really a showstopper IMO.
>
> It seems to me that there are several ways this problem can come about
> for which an administrator could plausibly want different policies.
>
> Wanting to prevent accidental checkouts of the root by users who don't
> understand the repository organization seems like something almost
> every Subversion installation would want to do.  If the protection was
> done in a way that wasn't too confusing to new users, I'd say it
> should be on by default.

Absolutely - in fact, in a corporate environment, where we don't really
need to worry about a DOSing of the server by external agents we do
have the "why did this take so long and fill up 20gig of my disk?" questions
at least once from every new user.  That is because we have many tags
and many branches (many more tags than branches) and some release
tags too.  Doing a "root" checkout ends up getting the code base over 100
times, which means that, well, the local WC is *very* big (and unneeded)

Preventing such checkouts would be a "very good thing" and doing
so via a hook, such that it can handle whatever layout you are using
and such that the return message can be specific to your site is even
better.  (I would almost say required since the number of different ways
repositories and projects can be structured is well beyond any fixed
expression capability)

--
Michael Sinz               Technology and Engineering Director/Consultant
"Starting Startups"                          mailto:Michael.Sinz@sinz.org
My place on the web                      http://www.sinz.org/Michael.Sinz

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Ways to keep users from checking out too much.

Posted by Jim Blandy <ji...@red-bean.com>.
On 2/22/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> Honestly, I don't care one way or another if they can --force it or
> not.  Checking out trees that large puts an unacceptable amount of
> strain on a public resource in this case, I just want to be able to
> stop them from making silly mistakes that require administrator effort
> to block, and reserve the admin effort for the cases where people are
> actually doing this kind of thing on purpose.  If they can add --force
> and make it work, that's nice, but it's not really a showstopper IMO.

It seems to me that there are several ways this problem can come about
for which an administrator could plausibly want different policies.

Wanting to prevent accidental checkouts of the root by users who don't
understand the repository organization seems like something almost
every Subversion installation would want to do.  If the protection was
done in a way that wasn't too confusing to new users, I'd say it
should be on by default.

Wanting to prevent root checkouts by search engine robots seems more
like a site-by-site decision.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Ways to keep users from checking out too much.

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/22/06, Jim Blandy <ji...@red-bean.com> wrote:
> On 2/22/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> > How would people feel about some mechanism for stopping update reports
> > rooted at particular directories?  It might at least prevent the
> > accidental foot shooting you get when an inexperienced user first
> > tries their hand at a svn checkout, and that would be a nice step in
> > the right direction.
>
> This seems to me like a pretty obvious hazard of the way Subversion
> repositories are laid out.  I think it would make sense to have
> something which would help people avoid doing it by accident, as long
> as it didn't prevent them from doing it on purpose.
>
> Say, a property which makes 'svn checkout' print a message and get a
> confirmation interactively from the user, or pass a '--force' flag?

Honestly, I don't care one way or another if they can --force it or
not.  Checking out trees that large puts an unacceptable amount of
strain on a public resource in this case, I just want to be able to
stop them from making silly mistakes that require administrator effort
to block, and reserve the admin effort for the cases where people are
actually doing this kind of thing on purpose.  If they can add --force
and make it work, that's nice, but it's not really a showstopper IMO.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Ways to keep users from checking out too much.

Posted by Jim Blandy <ji...@red-bean.com>.
On 2/22/06, Julian Foad <ju...@btopenworld.com> wrote:
> Normal Subversion properties cannot easily be applied to all revisions of an
> existing repository

You're right, of course.  Properties are not the right thing here.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Ways to keep users from checking out too much.

Posted by Julian Foad <ju...@btopenworld.com>.
Jim Blandy wrote:
> On 2/22/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> 
>>How would people feel about some mechanism for stopping update reports
>>rooted at particular directories?  It might at least prevent the
>>accidental foot shooting you get when an inexperienced user first
>>tries their hand at a svn checkout, and that would be a nice step in
>>the right direction.
> 
> This seems to me like a pretty obvious hazard of the way Subversion
> repositories are laid out.  I think it would make sense to have
> something which would help people avoid doing it by accident, as long
> as it didn't prevent them from doing it on purpose.

+1.

> Say, a property which makes 'svn checkout' print a message and get a
> confirmation interactively from the user, or pass a '--force' flag?

My first reaction was "Ooh, yuck, hacky" but in fact something like this, in 
that it is detected and obeyed entirely at the client side, might well be an 
appropriate solution.  Skip to the end and you'll find I change my mind, but 
meanwhile let's air some of the pros and cons and design choices.


CLIENT-SIDE

Normal Subversion properties cannot easily be applied to all revisions of an 
existing repository, but on the other hand they are suitable in that the set of 
protected directories might occasionally change from one revision to the next 
(during tree reorganisations) and it's a good mechanism for getting centralised 
information to the client.

Because a checkout is most commonly done on the head revision, there may be 
little need to restrict checkouts of older revisions if the aim is to prevent 
accidental misuse.

Therefore a property that says, "you really don't want to check out these 
roots" for, typically,

/
/branches
/tags

(or the equivalent for each of the projects in the repository) could do the 
job.  As Jim suggests, the client should refuse to continue unless overridden 
with some sort of extra confirmation.  Of course, only new clients would do 
this.  Is that a problem?  Perhaps it will be a significant problem in the 
short term.

It would presumably be a single-valued property set on each path.  It could 
alternatively be implemented as a single property containing a list of paths, 
but that seems to have only disadvantages unless we want to put it in a rev-prop.

The paths tagged should be those to be blocked, not those to be allowed, 
because there are only a few high-level paths to be blocked whereas it is quite 
reasonable to allow checkouts of any sub-tree of a project.  People might say 
sub-project checkouts don't make sense for their work environment and wish to 
block them, but that's not the motivation for this feature and that requirement 
would doubtless involve other things like blocking checkouts from certain 
revision ranges.

One advantage of a rev-prop is that it can be retroactively applied to old 
revisions of a project, but that doesn't feel like a great advantage.  A 
revprop doesn't automatically propagate to new revisions so a pre- or 
post-commit hook would have to accomplish that.  Another advantage is that an 
existing rev-prop can be changed to reflect a new administrative policy.


This configuration is conceptually an administrative policy, and it bothers me 
somewhat to have it stored in the repository as if it were a piece of the 
project's history.

What IS a part of the project's history is the concept of certain directories 
being project trees, others (lower) being project sub-trees, and others 
(higher) being not project tree roots but part of the repository organisational 
structure.  Unfortunately this classification is not clear-cut, but, if we're 
going to tag some of these directories with properties, it would be better for 
the properties to have a meaning and a name that reflects the function of the 
directory rather than trying to say directly what people should not do with it.


SERVER-SIDE

A server-side mechanism (e.g. pre-read hooks) has the advantage of applying 
easily to all clients (types and ages), applying to all revisions, keeping the 
administration separate from the project's content, and the ability for the 
administrator to choose whether to enforce it or provide an override. 
Disadvantages include not being so easy to provide an override.

Until I wrote that about the server side, I was thinking the client-side 
solution was pretty good.  Now I don't.  Even though it's probably more 
difficult to implement, I think a server-side mechanism is the right choice 
because all of the advantages I listed for it are significant and important.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ways to keep users from checking out too much.

Posted by Jim Blandy <ji...@red-bean.com>.
On 2/22/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> How would people feel about some mechanism for stopping update reports
> rooted at particular directories?  It might at least prevent the
> accidental foot shooting you get when an inexperienced user first
> tries their hand at a svn checkout, and that would be a nice step in
> the right direction.

This seems to me like a pretty obvious hazard of the way Subversion
repositories are laid out.  I think it would make sense to have
something which would help people avoid doing it by accident, as long
as it didn't prevent them from doing it on purpose.

Say, a property which makes 'svn checkout' print a message and get a
confirmation interactively from the user, or pass a '--force' flag?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Ways to keep users from checking out too much.

Posted by Michael Sinz <Mi...@sinz.org>.
kfogel@collab.net wrote:
> Greg Hudson <gh...@MIT.EDU> writes:
>> On Wed, 2006-02-22 at 12:55 -0800, Garrett Rooney wrote:
>>> How would people feel about some mechanism for stopping update reports
>>> rooted at particular directories?
>> I think this is a good idea, as a safety measure.  We just need to be
>> careful to document that it's purely a safety and not an access control;
>> a client could circumvent the mechanism by not using a report.
> 
> If our authz system worked in a certain way, this could be done
> entirely through authz.
> 
> Let ROOT be the root of the directory tree that you don't want
> checkouts to be rooted at.  If you create an object ROOT/FORBIDDEN,
> and tell authz that no one is allowed to read or write FORBIDDEN, then
> what happens (today) if you check out ROOT?  I believe you still get
> ROOT/*, with the exception of FORBIDDEN.  However, if the authz system
> had a flag you could set to say "Don't allow an operation to happen at
> all if any part of it is not permitted", then the ROOT/FORBIDDEN thing
> would solve Garrett's problem.

The danger here is that you would not want to block normal browsing of
the repository just due to the fact you can not check out at root.

authz does not have fine grained access controls for things like checkout
vs get for example.  Plus, many times I just do a diff between revision numbers
from "root" such that I don't need to remember a peg rev or what various
branches happened to have changes between here and there...

 > The performance costs might be quite high, I haven't thought that
 > through enough yet.  I just wanted to try re-imagining this problem as
 > a special case of authz, rather than as something special requiring
 > new hooks or other new mechanism(s).

I would think that something that required full path checking each time
would be a bit high, but then a hook is also a bit high in the overhead
department.  (Albeit not very high if it does not exist)

-- 
Michael Sinz                     Technology and Engineering Director/Consultant
"Starting Startups"                                mailto:michael.sinz@sinz.org
My place on the web                            http://www.sinz.org/Michael.Sinz

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ways to keep users from checking out too much.

Posted by kf...@collab.net.
Greg Hudson <gh...@MIT.EDU> writes:
> On Wed, 2006-02-22 at 12:55 -0800, Garrett Rooney wrote:
> > How would people feel about some mechanism for stopping update reports
> > rooted at particular directories?
> 
> I think this is a good idea, as a safety measure.  We just need to be
> careful to document that it's purely a safety and not an access control;
> a client could circumvent the mechanism by not using a report.

If our authz system worked in a certain way, this could be done
entirely through authz.

Let ROOT be the root of the directory tree that you don't want
checkouts to be rooted at.  If you create an object ROOT/FORBIDDEN,
and tell authz that no one is allowed to read or write FORBIDDEN, then
what happens (today) if you check out ROOT?  I believe you still get
ROOT/*, with the exception of FORBIDDEN.  However, if the authz system
had a flag you could set to say "Don't allow an operation to happen at
all if any part of it is not permitted", then the ROOT/FORBIDDEN thing
would solve Garrett's problem.

The performance costs might be quite high, I haven't thought that
through enough yet.  I just wanted to try re-imagining this problem as
a special case of authz, rather than as something special requiring
new hooks or other new mechanism(s).

-Karl

-- 
www.collab.net  <>  CollabNet  |  Distributed Development On Demand

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ways to keep users from checking out too much.

Posted by Greg Hudson <gh...@MIT.EDU>.
On Wed, 2006-02-22 at 12:55 -0800, Garrett Rooney wrote:
> How would people feel about some mechanism for stopping update reports
> rooted at particular directories?

I think this is a good idea, as a safety measure.  We just need to be
careful to document that it's purely a safety and not an access control;
a client could circumvent the mechanism by not using a report.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ways to keep users from checking out too much.

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/22/06, C. Michael Pilato <cm...@collab.net> wrote:
> Garrett Rooney wrote:
> > How would people feel about some mechanism for stopping update reports
> > rooted at particular directories?  It might at least prevent the
> > accidental foot shooting you get when an inexperienced user first
> > tries their hand at a svn checkout, and that would be a nice step in
> > the right direction.
>
> That is a very odd feature request if expressed as something you'd want
> to put into Subversion itself.  It's the kind of thing that would work
> well if we had (as many folks have asked for) read hooks in the
> repository, but ...
>
> Are you sure you aren't asking for a Band-Aid (tm) to cover up some
> performance or cancel-ability problems in Subversion that would be
> better fixed directly?

Well, if checking out the root of a tree with lots of tags and
branches in it didn't take so damn much CPU and IO, this would be less
of an issue, but I suspect we're in one of those categories where it's
a matter of "well, you asked it to do X, why are you surprised when it
takes a while".

It's not so much an issue of cancelability, I suspect the user could
cancel it out any time they so desire, but by the time they've
realized it's a problem they've been using a big chunk of our CPU and
disk bandwidth for several minutes, which is kind of uncool.  And
that's if it's an accident, if it's not accidental then they have no
desire to stop it.

One thing that might be handy would be an easier way to detect these
kind of users.  Right now, it's a matter of noticing that we've had a
httpd child process sucking up CPU for 5 or 6 minutes, checking
server-status and seeing that they're doing a REPORT, and then
stracing the process and confirming that yes, they seem to be hitting
multiple tags within the same directory and probably checking the svn
operational log to see if they've done similar stuff earlier in the
day.  It's a bit of a pain.

But even if this was easier, it'd still be nice if we could just stop
them from doing it in the first place without actually blocking
anonymous access totally.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Ways to keep users from checking out too much.

Posted by "C. Michael Pilato" <cm...@collab.net>.
Garrett Rooney wrote:
> How would people feel about some mechanism for stopping update reports
> rooted at particular directories?  It might at least prevent the
> accidental foot shooting you get when an inexperienced user first
> tries their hand at a svn checkout, and that would be a nice step in
> the right direction.

That is a very odd feature request if expressed as something you'd want
to put into Subversion itself.  It's the kind of thing that would work
well if we had (as many folks have asked for) read hooks in the
repository, but ...

Are you sure you aren't asking for a Band-Aid (tm) to cover up some
performance or cancel-ability problems in Subversion that would be
better fixed directly?

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand