You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Bojan Smojver <bo...@rexursive.com> on 2002/09/16 15:44:36 UTC

[PATCH]: mod_logio.c

This patch is against the current CVS and it enables logging the number
of input/output bytes per request, which is then useful for producing
actual statistics per virtual host or whatever else.

I have tested the module lightly and it works for me. The trick was to
employ both connection and request based filters and link them together.
I did that through connection notes. I'm not sure if this is the correct
way of doing things, but it does seem to produce accurate numbers in my
tests. I would really appreciate if someone could take a look and verify
if this is something that can be worked with.

The quick configuration is:

-------------------------
SetInputFilter LOGIO
SetOutputFilter LOGIO
-------------------------

And also:

-------------------------
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"
%I %O" combinedio
-------------------------

Of course, I'll be documenting the whole thing in the next few days...

Bojan

PS. Thanks everyone, especially Justin, for the hints and help!

Re: [PATCH]: mod_logio.c

Posted by Bojan Smojver <bo...@rexursive.com>.
Why is it that every time I submit a patch, there is at least one bug in
it (don't bother to answer, rhetorical)? Apologies everyone, the
previous patch wouldn't work with HTTP/1.0 and HTTP/0.9. This one
should...

Bojan

On Mon, 2002-09-16 at 23:44, Bojan Smojver wrote:
> This patch is against the current CVS and it enables logging the number
> of input/output bytes per request, which is then useful for producing
> actual statistics per virtual host or whatever else.
> 
> I have tested the module lightly and it works for me. The trick was to
> employ both connection and request based filters and link them together.
> I did that through connection notes. I'm not sure if this is the correct
> way of doing things, but it does seem to produce accurate numbers in my
> tests. I would really appreciate if someone could take a look and verify
> if this is something that can be worked with.
> 
> The quick configuration is:
> 
> -------------------------
> SetInputFilter LOGIO
> SetOutputFilter LOGIO
> -------------------------
> 
> And also:
> 
> -------------------------
> LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"
> %I %O" combinedio
> -------------------------
> 
> Of course, I'll be documenting the whole thing in the next few days...
> 
> Bojan
> 
> PS. Thanks everyone, especially Justin, for the hints and help!

Re: [PATCH]: mod_logio.c

Posted by Bojan Smojver <bo...@rexursive.com>.
On Tue, 2002-09-17 at 16:20, Justin Erenkrantz wrote:

> Is there a reason that everything can't be done on the conn_rec?
> What's the reason for having the request-based input filter again?
> (Please refresh my old and decrepit memory!)  Can't we do everything
> in the connection filter?  Isn't this just going to duplicate
> everything (i.e. the conn filter sees it once, then the request
> filter sees it again)?

In the connection level filter request is NULL, so we can't store
anything into it. We need the request based filter to tie the counted
number of input bytes with the particular request.

I figured out what's going on with double counting. My test numbers were
correct and yet connection level filter was able to see all data, which
means that body of the request would get counted twice - and yet it
wasn't. The reason for that is very simple - I did some RTFM but I
wasn't paying attention to the fact that ap_get_brigade() has to be the
*very first* call in the filter... Silly!

Anyway, the correct thing to do is to count *only* in connection level
filter and then just move the note across from c->notes to r->notes in
the request level filter. I've corrected that part of the code.

Also, I think that allowing SetInputFilter and SetOutputFilter options
once the module is compiled in might create problems. If the same
connection is used for 2 different virtual hosts and one of them doesn't
have the filter configured, the c->notes will never get unset thus
creating problem. So, I think I'll use the explicit hooks instead. I've
changed that part of the code as well.

> In fact, if we make it a request filter, aren't connection-level
> filters such as SSL make the value different?  For a SSL connection,
> we have to set these values *before* (on input) or *after* (on
> output) SSL gets it.  We're only concerned with the actual bytes
> transmitted on the network socket.

Yep, you are right there as well. SSL will definitely change that. It
should be OK for input (I fixed that part), since the counting is done
only in the connection level filter. But, I'll have to fix up the output
for sure (i.e. count in the connection level filter). Not sure yet how
to do that, but will keep trying :-)

> I still think we can work it out so that the connection filters
> have a pointer to the request_rec.  (i.e. ap_read_request could
> update a pointer within the conn to indicate the current request
> *or* update all of the current input filters to point at the new
> request_rec.)  I think we can manage something that won't get
> vetoed.

If I knew how to pull that one off, I would definitely go for it :-)
Promise to have a look though.

> And, again, I'm totally in favor of having this filter update/modify
> r->bytes_sent/r->read_length (perhaps replacing read_length in favor
> of bytes_read to be consistent).  I maintain the definition of
> bytes_sent excluding the headers is remarkably lame and no one ever
> really wants that anyway.
> 
> Has anyone said it should remain broken?  If so, why?  -- justin

Actually, yes. I think someone mentioned in one of the previous threads
that r->bytes_sent should be *only* the actual length of the sent body,
not the headers. This filter would screw that up.

When I thought about all this a bit more, it seemed like a bit of an
upset to introduce two new fields into request_rec (given that
bytes_sent would have to be kept as is). Furthermore, not many people
would opt for this kind of logging, so the fields would do nothing for
them. On the other hand, notes are strings and in mod_log_config.c they
don't have to be converted into anything else but just passed on as they
are. Probably a bit slower then the request_rec field approach, but I'm
guessing it shouldn't be all that bad. And it won't upset the people
that don't want request_rec increased in size because of this.

Once again, thank you for your input. I really appreciate it. And thank
you for your patience while I'm learning Apache API!

Bojan


Re: [PATCH]: mod_logio.c

Posted by Justin Erenkrantz <je...@apache.org>.
On Mon, Sep 16, 2002 at 11:44:36PM +1000, Bojan Smojver wrote:
> This patch is against the current CVS and it enables logging the number
> of input/output bytes per request, which is then useful for producing
> actual statistics per virtual host or whatever else.
> 
> I have tested the module lightly and it works for me. The trick was to
> employ both connection and request based filters and link them together.
> I did that through connection notes. I'm not sure if this is the correct
> way of doing things, but it does seem to produce accurate numbers in my
> tests. I would really appreciate if someone could take a look and verify
> if this is something that can be worked with.

Is there a reason that everything can't be done on the conn_rec?
What's the reason for having the request-based input filter again?
(Please refresh my old and decrepit memory!)  Can't we do everything
in the connection filter?  Isn't this just going to duplicate
everything (i.e. the conn filter sees it once, then the request
filter sees it again)?

In fact, if we make it a request filter, aren't connection-level
filters such as SSL make the value different?  For a SSL connection,
we have to set these values *before* (on input) or *after* (on
output) SSL gets it.  We're only concerned with the actual bytes
transmitted on the network socket.

I still think we can work it out so that the connection filters
have a pointer to the request_rec.  (i.e. ap_read_request could
update a pointer within the conn to indicate the current request
*or* update all of the current input filters to point at the new
request_rec.)  I think we can manage something that won't get
vetoed.

And, again, I'm totally in favor of having this filter update/modify
r->bytes_sent/r->read_length (perhaps replacing read_length in favor
of bytes_read to be consistent).  I maintain the definition of
bytes_sent excluding the headers is remarkably lame and no one ever
really wants that anyway.

Has anyone said it should remain broken?  If so, why?  -- justin