You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Ben Collins-Sussman <su...@collab.net> on 2003/07/14 19:10:02 UTC

timeout mysteries SOLVED! (was Re: Large Repositories)

"Johnson, Graham" <gj...@alpineaccess.com> writes:

> I have SVN 0.25.0 and a ~210MB repository installed on two different
> servers -- a Sun Ultra-250 running Solaris 8, and an AMD Athlon machine
> of unknown speed running Linux.  The biggest operation I've tried so far
> in SVN, checking out a directory of ~3000 files, takes ~7 minutes on the
> Sun before the client starts listing the first files being checked out;
> on the AMD it is ~4 minutes.  I have my timeout set to 600.

OK, there have been a whole lot of reports over the last month about
people getting HTTP "timeout" errors from their svn clients:

  * some people have noticed large checkouts and updates timing out,
    or sitting for minutes before files start to appear ("timed out
    waiting for REPORT response")

  * others have noticed that huge commits or imports can produce
    timeouts, especially if adding/changing many files ("timed out
    waiting for MERGE response")

Well, the mystery is solved!  After hours of my being able to
intermittently reproduce the bugs (and pulling out all my remaining
hair), I finally figured out the culprit: it's gzip compression.

Specifically, apache's mod_deflate filter is *blocking* all streamy
responses from the server.  For a very large checkout, update, or
commit, it's entirely reasonable that it take several minutes for the
server to generate a description; normally the repsonse is streamed to
the client, which acts on the data as it comes.  But if the client
sends "Accept-Encoding: gzip", and the server has mod_deflate compiled
in, then apache's mod_deflate buffers the *entire* repsonse before
gzipping it and sending to the client.

Sure enough, if you set "http-compression = no" in the [global]
section of your client's ~/.subversion/servers file, all the timeout
problems vanish.  This is the official workaround, until we fix apache.

gstein says that jerenkrantz is now going to have to fix mod_deflate. :-)

In any case: if you've been experiencing client timeouts, try this
workaround, and mail the list.  I'd like to whether things start
working again.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: timeout mysteries SOLVED! (was Re: Large Repositories)

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.

--On Monday, July 14, 2003 21:41:29 +0200 "Branko ?ibej" <br...@xbc.nu> 
wrote:

> Aargh, talk about breakage...
>
> On a related note, is there any chance we could avoid mod_deflate
> completely if we're sending a svndiff? I think using mod_deflate is a
> mistake in the first place. If we want compression, we should just
> self-compress fulltexts with vdelts over ra_dav.

apr_table_set(r->subprocess_env, "no-gzip", 1);

If that's done before the response is written, mod_deflate will get out of 
the way.  You could also remove the DEFLATE output filter in 
r->output_filters, but that'd require you to walk the filter list to find 
it.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: timeout mysteries SOLVED! (was Re: Large Repositories)

Posted by Branko Čibej <br...@xbc.nu>.

Ben Collins-Sussman wrote:

>Specifically, apache's mod_deflate filter is *blocking* all streamy
>responses from the server.  For a very large checkout, update, or
>commit, it's entirely reasonable that it take several minutes for the
>server to generate a description; normally the repsonse is streamed to
>the client, which acts on the data as it comes.  But if the client
>sends "Accept-Encoding: gzip", and the server has mod_deflate compiled
>in, then apache's mod_deflate buffers the *entire* repsonse before
>gzipping it and sending to the client.
>
Aargh, talk about breakage...

On a related note, is there any chance we could avoid mod_deflate
completely if we're sending a svndiff? I think using mod_deflate is a
mistake in the first place. If we want compression, we should just
self-compress fulltexts with vdelts over ra_dav.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Flushes in httpd was Re: timeout mysteries SOLVED! (was Re: Large Repositories)

Posted by Jack Repenning <jr...@collab.net>.

At 2:26 AM -0700 7/15/03, Justin Erenkrantz wrote:
>Either ap_filter_flush is a valid solution, or it should be removed 
>entirely. You can't have it both ways.  Requiring the addition of 
>custom threshold code to *every* filter is a cumbersome requirement. 
>If you really feel that's needed, then we should remove flush 
>support to compensate.  -- justin

Well, it might be that this, in turn, is a tad overboard.  The 
established conventions for network protocols are pretty highly 
optimized and nuanced, with clear and unremovable roles for all these 
pieces.  Transports should certainly deliver data; they're encouraged 
to batch the delivery for efficiency, but they're also responsible to 
send the data sooner or later.  Clients, on the other hand, should 
not have to think about line discipline for the most part, but there 
are occasions when this is necessary.

In this case, it sounds like there's a fundamentally message-based 
service layer, waiting for its packet to be delivered and a response 
to arrive, but using a fundamentally stream-based transport (which is 
still waiting for enough data to make the send worthwhile).  Do I 
read that right? If so, that kind of core-algorithm mismatch is one 
of the reasons for flush calls in the APIs.
-- 
-==-
Jack Repenning
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
o: 650.228.2562
c: 408.835-8090

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Flushes in httpd

Posted by Greg Stein <gs...@lyra.org>.

Hah! And we're all idiots, too. :-)

ap_filter_flush() doesn't actually generate a FLUSH bucket. It is simply a
cover for ap_pass_brigade() with a prototype and an assumption about its
void* parameter to make it useful to pass into the apr_brigade_* writing
functions.

IOW, the ap_filter_flush -> ap_pass_brigade change will have no material
impact except to save a function call.

I'll be updating some headers and whatnot in httpd to clarify this, and I'll
go ahead and remove that overhead. But the underlying point still stands:
there is no way that mod_deflate should continue to buffer *ENTIRE*
responses. That is simple insanity...  (not to mention the cause of the
timeouts people have been seeing, and quite probably the large working sets
that people have seen for httpd)

Cheers,
-g

On Tue, Jul 15, 2003 at 10:59:53AM -0700, Greg Stein wrote:
> Right. mod_dav's use of ap_filter_flush() is wrong. It should be using
> ap_pass_brigade() to pass the brigade contents down the filter stack. As it
> stands right now, we are flushing the content to the network via a FLUSH
> bucket, returning from the handler, and Apache is then sending a brigade
> with an EOS bucket. If we just pass our remaining brigade contents (before
> function return in the two places where flush is called), then the EOS will
> ensure the response is flushed to the network.
> 
> (in fact, I'll make this ap_filter_flush -> ap_pass_brigade change later
>  today... it is the right thing to do)
>...

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Flushes in httpd

Posted by Greg Stein <gs...@lyra.org>.

Right. mod_dav's use of ap_filter_flush() is wrong. It should be using
ap_pass_brigade() to pass the brigade contents down the filter stack. As it
stands right now, we are flushing the content to the network via a FLUSH
bucket, returning from the handler, and Apache is then sending a brigade
with an EOS bucket. If we just pass our remaining brigade contents (before
function return in the two places where flush is called), then the EOS will
ensure the response is flushed to the network.

(in fact, I'll make this ap_filter_flush -> ap_pass_brigade change later
 today... it is the right thing to do)

Cheers,
-g

On Tue, Jul 15, 2003 at 05:53:29AM -0700, rbb@rkbloom.net wrote:
> 
> I really wasn't going to get into this, but oh well.
> 
> The flush support is almost always used incorrectly in HTTPd.  The purpose
> of the flush bucket is simply to _force_ data to be sent to the user.
> It's purpose is to say: "There is no more data coming for a while, so send
> what you have to the user while I keep generating more data".  That isn't
> the case you have here.
> 
> Here, you have a poorly behaved filter that isn't watching how much it is
> caching.  Take the simple (and inane) case.  I have a 100 MB static file
> that I didn't compress on disk, but I want it sent to the browser using
> mod_deflate.  If mod_deflate buffers the whole thing, then I have a
> problem, because it is going to take a while.  If you "fix" this in
> mod_dav, then _every_ other content generator that sends data through
> mod_deflate needs the same fix.
> 
> Filters have the responsibility to watch how much they buffer and take
> steps to make sure it is as little as possible.  So the fact that mod_dav
> is alredy doing the wrong thing isn't really a good argument.
> 
> It is really simple.  Take a look back at the archives from when filtering
> was designed, and you will see that we were very clear that filters had to
> be well behaved.  They couldn't just buffer the data until they had it
> all.  Doing so killed server performance and memory use.  We worked very
> hard to make sure that the content_length filter didn't do it, and if
> mod_deflate is doing so, then it is wrong.
> 
> Ryan
> 
> On Tue, 15 Jul 2003, Sander Striker wrote:
> 
> > > From: Justin Erenkrantz [mailto:justin@erenkrantz.com]
> > > Sent: Tuesday, July 15, 2003 11:26 AM
> >
> > > --On Monday, July 14, 2003 11:55 PM -0700 Greg Stein <gs...@lyra.org> wrote:
> > >
> > > > What Ryan said. The handler shouldn't ever say "force this to the network".
> > >
> > > Which I think is ridiculous for either of you to say when mod_dav already
> > > calls ap_filter_flush to do exactly that.  As I said earlier, I have a hunch
> > > that it just needs one more flush call to solve this particular timeout
> > > problem (right after the invocation of dav_send_one_response).
> >
> > I agree with Justin here.  If mod_dav wouldn't already have flush calls I
> > would see your point, but this is just plain silly.
> >
> > Also, mod_dav knows when it has reached a chunk boundary that makes sense
> > to the client.  How do we communicate this to the next filter in the chain?
> > We don't.  So apart from flushing, how do we prevent an endless wait on
> > the deflate filter?  Certainly, deflate could work with smaller chunks, but
> > then it would be less effective.
> >
> > > Either ap_filter_flush is a valid solution, or it should be removed entirely.
> > > You can't have it both ways.  Requiring the addition of custom threshold code
> > > to *every* filter is a cumbersome requirement.  If you really feel that's
> > > needed, then we should remove flush support to compensate.  -- justin
> >
> > The man's got a point.
> >
> >
> > Sander
> >

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: SVN running at Oracle.com

Posted by Mukund <mu...@tessna.com>.

On Tue, Jul 15, 2003 at 02:54:12PM +0200, Wieland Pusch wrote:
| Hello friends,
| 
| I just noticed that
| http://oss.oracle.com/projects/hangcheck-timer/source.html
| is running SVN
| :-)

Apparently a few of the projects there use Subversion (on oss.oracle.com).
Manish Singh (you'll know him from the GIMP project) manages some of the
stuff on it, and he approved Subversion.


-- 

Mukund


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

SVN running at Oracle.com

Posted by Wieland Pusch <wi...@wielandpusch.de>.

Hello friends,

I just noticed that
http://oss.oracle.com/projects/hangcheck-timer/source.html
is running SVN
:-)

cu
 Wieland                            mailto:wieland@wielandpusch.de


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Flushes in httpd was Re: timeout mysteries SOLVED! (was Re: Large Repositories)

Posted by rb...@rkbloom.net.

I really wasn't going to get into this, but oh well.

The flush support is almost always used incorrectly in HTTPd.  The purpose
of the flush bucket is simply to _force_ data to be sent to the user.
It's purpose is to say: "There is no more data coming for a while, so send
what you have to the user while I keep generating more data".  That isn't
the case you have here.

Here, you have a poorly behaved filter that isn't watching how much it is
caching.  Take the simple (and inane) case.  I have a 100 MB static file
that I didn't compress on disk, but I want it sent to the browser using
mod_deflate.  If mod_deflate buffers the whole thing, then I have a
problem, because it is going to take a while.  If you "fix" this in
mod_dav, then _every_ other content generator that sends data through
mod_deflate needs the same fix.

Filters have the responsibility to watch how much they buffer and take
steps to make sure it is as little as possible.  So the fact that mod_dav
is alredy doing the wrong thing isn't really a good argument.

It is really simple.  Take a look back at the archives from when filtering
was designed, and you will see that we were very clear that filters had to
be well behaved.  They couldn't just buffer the data until they had it
all.  Doing so killed server performance and memory use.  We worked very
hard to make sure that the content_length filter didn't do it, and if
mod_deflate is doing so, then it is wrong.

Ryan

On Tue, 15 Jul 2003, Sander Striker wrote:

> > From: Justin Erenkrantz [mailto:justin@erenkrantz.com]
> > Sent: Tuesday, July 15, 2003 11:26 AM
>
> > --On Monday, July 14, 2003 11:55 PM -0700 Greg Stein <gs...@lyra.org> wrote:
> >
> > > What Ryan said. The handler shouldn't ever say "force this to the network".
> >
> > Which I think is ridiculous for either of you to say when mod_dav already
> > calls ap_filter_flush to do exactly that.  As I said earlier, I have a hunch
> > that it just needs one more flush call to solve this particular timeout
> > problem (right after the invocation of dav_send_one_response).
>
> I agree with Justin here.  If mod_dav wouldn't already have flush calls I
> would see your point, but this is just plain silly.
>
> Also, mod_dav knows when it has reached a chunk boundary that makes sense
> to the client.  How do we communicate this to the next filter in the chain?
> We don't.  So apart from flushing, how do we prevent an endless wait on
> the deflate filter?  Certainly, deflate could work with smaller chunks, but
> then it would be less effective.
>
> > Either ap_filter_flush is a valid solution, or it should be removed entirely.
> > You can't have it both ways.  Requiring the addition of custom threshold code
> > to *every* filter is a cumbersome requirement.  If you really feel that's
> > needed, then we should remove flush support to compensate.  -- justin
>
> The man's got a point.
>
>
> Sander
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Flushes in httpd was Re: timeout mysteries SOLVED! (was Re: Large Repositories)

Posted by Sander Striker <st...@apache.org>.

> From: Justin Erenkrantz [mailto:justin@erenkrantz.com]
> Sent: Tuesday, July 15, 2003 11:26 AM

> --On Monday, July 14, 2003 11:55 PM -0700 Greg Stein <gs...@lyra.org> wrote:
> 
> > What Ryan said. The handler shouldn't ever say "force this to the network".
> 
> Which I think is ridiculous for either of you to say when mod_dav already 
> calls ap_filter_flush to do exactly that.  As I said earlier, I have a hunch 
> that it just needs one more flush call to solve this particular timeout 
> problem (right after the invocation of dav_send_one_response).

I agree with Justin here.  If mod_dav wouldn't already have flush calls I
would see your point, but this is just plain silly.

Also, mod_dav knows when it has reached a chunk boundary that makes sense
to the client.  How do we communicate this to the next filter in the chain?
We don't.  So apart from flushing, how do we prevent an endless wait on
the deflate filter?  Certainly, deflate could work with smaller chunks, but
then it would be less effective.

> Either ap_filter_flush is a valid solution, or it should be removed entirely. 
> You can't have it both ways.  Requiring the addition of custom threshold code 
> to *every* filter is a cumbersome requirement.  If you really feel that's 
> needed, then we should remove flush support to compensate.  -- justin

The man's got a point.

Sander

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Flushes in httpd was Re: timeout mysteries SOLVED! (was Re: Large Repositories)

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.

--On Monday, July 14, 2003 11:55 PM -0700 Greg Stein <gs...@lyra.org> wrote:

> What Ryan said. The handler shouldn't ever say "force this to the network".

Which I think is ridiculous for either of you to say when mod_dav already 
calls ap_filter_flush to do exactly that.  As I said earlier, I have a hunch 
that it just needs one more flush call to solve this particular timeout 
problem (right after the invocation of dav_send_one_response).

Either ap_filter_flush is a valid solution, or it should be removed entirely. 
You can't have it both ways.  Requiring the addition of custom threshold code 
to *every* filter is a cumbersome requirement.  If you really feel that's 
needed, then we should remove flush support to compensate.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: timeout mysteries SOLVED! (was Re: Large Repositories)

Posted by Greg Stein <gs...@lyra.org>.

On Mon, Jul 14, 2003 at 01:01:50PM -0700, rbb@rkbloom.net wrote:
> On Mon, 14 Jul 2003, Justin Erenkrantz wrote:
>...
> > The correct solution, IMHO, is to have mod_dav pass flush buckets down the
> > output chain by using ap_rflush or sending down its own flush buckets via
> > ap_filter_flush.  The function that looks like it definitely needs the
> > flush is dav_send_one_response.  Not sure if that'll catch all of your
> > cases or not - it may need to be added other places.
> >
> > mod_dav knows when it is appropriate to flush the content.  mod_deflate
> > doesn't attempt to make any determiniations on its own when to flush.
> > Mainly because doing the flush may be expensive in zlib - we don't want to
> > do it on every deflate() call.  That logic is best placed in the handler -
> > mod_dav.  And, for the most part, mod_dav does create flush buckets - it
> > just seems to be missing a few obvious cases.  -- justin
> 
> Sorry Justin, but that is bogus.  The whole point of the handler/filter
> flow is that the handler shouldn't have to control stuff like this.  The
> filter needs to realize that it has buffered far too much data and flush
> it appropriately.  Take a look at the core_output_filter.  It doesn't wait
> for a FLUSH bucket from the handler.  It notices that it has buffered 8k
> of data and takes the initaitive to get it to the network ASAP.  If you
> push this up to the handler then every handler will need special logic to
> deal with every filter.

*giggle*

  "I agree whole-heartedly with Ryan here"

*snicker*

What Ryan said. The handler shouldn't ever say "force this to the network".

Cheers,
-g

p.s. for the curious... Ryan and I (historically) didn't see eye-to-eye
often... hehe...

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: timeout mysteries SOLVED! (was Re: Large Repositories)

Posted by rb...@rkbloom.net.


On Mon, 14 Jul 2003, Justin Erenkrantz wrote:

> --On Monday, July 14, 2003 14:10:02 -0500 Ben Collins-Sussman
> <su...@collab.net> wrote:
>
> > Sure enough, if you set "http-compression = no" in the [global]
> > section of your client's ~/.subversion/servers file, all the timeout
> > problems vanish.  This is the official workaround, until we fix apache.
> >
> > gstein says that jerenkrantz is now going to have to fix mod_deflate. :-)
>
> The correct solution, IMHO, is to have mod_dav pass flush buckets down the
> output chain by using ap_rflush or sending down its own flush buckets via
> ap_filter_flush.  The function that looks like it definitely needs the
> flush is dav_send_one_response.  Not sure if that'll catch all of your
> cases or not - it may need to be added other places.
>
> mod_dav knows when it is appropriate to flush the content.  mod_deflate
> doesn't attempt to make any determiniations on its own when to flush.
> Mainly because doing the flush may be expensive in zlib - we don't want to
> do it on every deflate() call.  That logic is best placed in the handler -
> mod_dav.  And, for the most part, mod_dav does create flush buckets - it
> just seems to be missing a few obvious cases.  -- justin

Sorry Justin, but that is bogus.  The whole point of the handler/filter
flow is that the handler shouldn't have to control stuff like this.  The
filter needs to realize that it has buffered far too much data and flush
it appropriately.  Take a look at the core_output_filter.  It doesn't wait
for a FLUSH bucket from the handler.  It notices that it has buffered 8k
of data and takes the initaitive to get it to the network ASAP.  If you
push this up to the handler then every handler will need special logic to
deal with every filter.

Ryan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: timeout mysteries SOLVED! (was Re: Large Repositories)

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.

--On Monday, July 14, 2003 14:10:02 -0500 Ben Collins-Sussman 
<su...@collab.net> wrote:

> Sure enough, if you set "http-compression = no" in the [global]
> section of your client's ~/.subversion/servers file, all the timeout
> problems vanish.  This is the official workaround, until we fix apache.
>
> gstein says that jerenkrantz is now going to have to fix mod_deflate. :-)

The correct solution, IMHO, is to have mod_dav pass flush buckets down the 
output chain by using ap_rflush or sending down its own flush buckets via 
ap_filter_flush.  The function that looks like it definitely needs the 
flush is dav_send_one_response.  Not sure if that'll catch all of your 
cases or not - it may need to be added other places.

mod_dav knows when it is appropriate to flush the content.  mod_deflate 
doesn't attempt to make any determiniations on its own when to flush. 
Mainly because doing the flush may be expensive in zlib - we don't want to 
do it on every deflate() call.  That logic is best placed in the handler - 
mod_dav.  And, for the most part, mod_dav does create flush buckets - it 
just seems to be missing a few obvious cases.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org