You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modules-dev@httpd.apache.org by Andrej van der Zee <an...@gmail.com> on 2010/11/26 01:39:35 UTC

Apache log modules

Hi,

I am looking for a way to deduct the concept of a "transaction" from
the Apache log. What I mean is that I want to group HTTP requests that
are sent by one particular client, for example when a user clicks a
link in the browser. Then I want to be able to group all the HTTP
requests that are the result of that one click (only taking account
for requests that are going to our servers).

Assuming that both client and server have the KeepAlive enabled, I
though that maybe a custom Apache log-module could write the
connection ID (if such a thing exists) to the log in order to
distinguish different clients. Moreover, assuming that users wait for
at least 1 second between clicks, I should be able to deduct
transactions by grouping them on timestamps that fall within a second.

The scheme with the connection ID, can it work, or am I misjudging
something completely? Are there alternatives?

Thank you,
Andrej

Re: Apache log modules

Posted by Stefan Ruppert <sr...@myarm.com>.
Hi Andrej,

the concept of a transaction is defined in the Application Response 
Measurement (ARM) standard. The mod_arm4 implements an interface between 
the Apache httpd and the ARM standard. ARM has the capability to 
correlate transactions in distributed environments and with an ARM 
enabled HTTP client you get all what you want.

Also we at MyARM are working on a firefox extension which will measure 
any HTTP request sent by the browser and these transactions will be 
correlated to any HTTP request measured in the apache httpd using the 
mod_arm4 module.

Regards,
Stefan


Andrej van der Zee wrote:
> Hi,
> 
> I am looking for a way to deduct the concept of a "transaction" from
> the Apache log. What I mean is that I want to group HTTP requests that
> are sent by one particular client, for example when a user clicks a
> link in the browser. Then I want to be able to group all the HTTP
> requests that are the result of that one click (only taking account
> for requests that are going to our servers).
> 
> Assuming that both client and server have the KeepAlive enabled, I
> though that maybe a custom Apache log-module could write the
> connection ID (if such a thing exists) to the log in order to
> distinguish different clients. Moreover, assuming that users wait for
> at least 1 second between clicks, I should be able to deduct
> transactions by grouping them on timestamps that fall within a second.
> 
> The scheme with the connection ID, can it work, or am I misjudging
> something completely? Are there alternatives?
> 
> Thank you,
> Andrej
> 
> 


Re: Apache log modules

Posted by Sorin Manolache <so...@gmail.com>.
On Mon, Nov 29, 2010 at 00:56, Andrej van der Zee
<an...@gmail.com> wrote:
> Hi Sorin,
>
> Thanks for your reply.
>
>>
>> request_rec->connection->id is a long int that is unique. It is built
>> from the process_id and thread_id of the apache thread that serves the
>> request.
>
> Will this be unique for MPM worker across control processes / worker threads?

Yes. It's unique across sites too as it contains the server IP address as well.

>> However, a client may open several connections to the server
>> during the same transaction, so I guess this does not help you much.
>
> My assumption is that both client and server have KeepAlive enabled.
> In that case, should there "generally" not be just one connection
> only?

As other people remarked, browsers open several connections even if
they and the server support keepalive. Just clear your browser cache
and surf to any site while running netstat in a console.

>> There's a module called unique_id. It creates a string that is stored
>> in the req->subprocess_env and can be logged with "%{UNIQUE_ID}e". It
>> encodes the request timestamp, the connection->id, the _server_ IP and
>> a random number. It does not encode the client IP or port. However, if
>> you combine it with the client-IP and client port that you can log as
>> well in the same log line, you could, probably, extract what you want
>> after some log-postprocessing.
>
> So I can decode the unique ID and extract the connection ID from it? I
> guess so...
>
>>
>> Different clients behind a NAT router will use different ports.
>> However, based solely on ports, you won't be able to distinguish
>> between two different clients on one hand or one client that makes
>> several connections on the other hand. A method that is used by some
>> of my colleagues in order to distinguish between different clients
>> (but I don't know much about it, so I can't tell you more) is to
>> analyse the TCP header of the packet in order to extract the TCP
>> sequence numbers.
>
> That sounds interesting but as far as I can see this information is
> not available in the application layer. Hard to imagine, but are there
> any tricks to get this information in Apache modules? Or are your
> colleagues using sniffers such as tcpdump?

Indeed it is not available at the application layer. I do not know how
they do it.

S

Re: Apache log modules

Posted by rm...@tuxteam.de.
On Mon, Nov 29, 2010 at 08:56:50AM +0900, Andrej van der Zee wrote:
> Hi Sorin,
> 
> Thanks for your reply.
> 
> >
> > request_rec->connection->id is a long int that is unique. It is built
> > from the process_id and thread_id of the apache thread that serves the
> > request.
> 
> Will this be unique for MPM worker across control processes / worker threads?
> 
> > However, a client may open several connections to the server
> > during the same transaction, so I guess this does not help you much.
> 
> My assumption is that both client and server have KeepAlive enabled.
> In that case, should there "generally" not be just one connection
> only?

This assumption is most likely wrong. Even with keep-alive browsers will
open more than one connection. After all, keep-alive and multi-connect
solve different problems.

> >
> > There's a module called unique_id. It creates a string that is stored
> > in the req->subprocess_env and can be logged with "%{UNIQUE_ID}e". It
> > encodes the request timestamp, the connection->id, the _server_ IP and
> > a random number. It does not encode the client IP or port. However, if
> > you combine it with the client-IP and client port that you can log as
> > well in the same log line, you could, probably, extract what you want
> > after some log-postprocessing.
> 
> So I can decode the unique ID and extract the connection ID from it? I
> guess so...
> 
> >
> > Different clients behind a NAT router will use different ports.
> > However, based solely on ports, you won't be able to distinguish
> > between two different clients on one hand or one client that makes
> > several connections on the other hand. A method that is used by some
> > of my colleagues in order to distinguish between different clients
> > (but I don't know much about it, so I can't tell you more) is to
> > analyse the TCP header of the packet in order to extract the TCP
> > sequence numbers.
> 
> That sounds interesting but as far as I can see this information is
> not available in the application layer. Hard to imagine, but are there
> any tricks to get this information in Apache modules? Or are your
> colleagues using sniffers such as tcpdump?

No, package assembly is the operating system's job - Apache sees a
stream of octets, not packages.

 HTH  Ralf Mattes


> 
> Thank you,
> Andrej

Re: Apache log modules

Posted by Andrej van der Zee <an...@gmail.com>.
Hi Sorin,

Thanks for your reply.

>
> request_rec->connection->id is a long int that is unique. It is built
> from the process_id and thread_id of the apache thread that serves the
> request.

Will this be unique for MPM worker across control processes / worker threads?

> However, a client may open several connections to the server
> during the same transaction, so I guess this does not help you much.

My assumption is that both client and server have KeepAlive enabled.
In that case, should there "generally" not be just one connection
only?

>
> There's a module called unique_id. It creates a string that is stored
> in the req->subprocess_env and can be logged with "%{UNIQUE_ID}e". It
> encodes the request timestamp, the connection->id, the _server_ IP and
> a random number. It does not encode the client IP or port. However, if
> you combine it with the client-IP and client port that you can log as
> well in the same log line, you could, probably, extract what you want
> after some log-postprocessing.

So I can decode the unique ID and extract the connection ID from it? I
guess so...

>
> Different clients behind a NAT router will use different ports.
> However, based solely on ports, you won't be able to distinguish
> between two different clients on one hand or one client that makes
> several connections on the other hand. A method that is used by some
> of my colleagues in order to distinguish between different clients
> (but I don't know much about it, so I can't tell you more) is to
> analyse the TCP header of the packet in order to extract the TCP
> sequence numbers.

That sounds interesting but as far as I can see this information is
not available in the application layer. Hard to imagine, but are there
any tricks to get this information in Apache modules? Or are your
colleagues using sniffers such as tcpdump?

Thank you,
Andrej

Re: Apache log modules

Posted by Sorin Manolache <so...@gmail.com>.
On Fri, Nov 26, 2010 at 01:39, Andrej van der Zee
<an...@gmail.com> wrote:
> Hi,
>
> I am looking for a way to deduct the concept of a "transaction" from
> the Apache log. What I mean is that I want to group HTTP requests that
> are sent by one particular client, for example when a user clicks a
> link in the browser. Then I want to be able to group all the HTTP
> requests that are the result of that one click (only taking account
> for requests that are going to our servers).
>
> Assuming that both client and server have the KeepAlive enabled, I
> though that maybe a custom Apache log-module could write the
> connection ID (if such a thing exists) to the log in order to
> distinguish different clients. Moreover, assuming that users wait for
> at least 1 second between clicks, I should be able to deduct
> transactions by grouping them on timestamps that fall within a second.
>
> The scheme with the connection ID, can it work, or am I misjudging
> something completely? Are there alternatives?
>
> Thank you,
> Andrej
>

request_rec->connection->id is a long int that is unique. It is built
from the process_id and thread_id of the apache thread that serves the
request. However, a client may open several connections to the server
during the same transaction, so I guess this does not help you much.

There's a module called unique_id. It creates a string that is stored
in the req->subprocess_env and can be logged with "%{UNIQUE_ID}e". It
encodes the request timestamp, the connection->id, the _server_ IP and
a random number. It does not encode the client IP or port. However, if
you combine it with the client-IP and client port that you can log as
well in the same log line, you could, probably, extract what you want
after some log-postprocessing.

Different clients behind a NAT router will use different ports.
However, based solely on ports, you won't be able to distinguish
between two different clients on one hand or one client that makes
several connections on the other hand. A method that is used by some
of my colleagues in order to distinguish between different clients
(but I don't know much about it, so I can't tell you more) is to
analyse the TCP header of the packet in order to extract the TCP
sequence numbers.

Sorin

Re: Apache log modules

Posted by Andrej van der Zee <an...@gmail.com>.
Hi,

> My condolences.  I have felt your pain.

Thanks ;)

I just read the documentation of mod_log_config 2.3 and some
logging-options have been added, to my pleasant surprise! I was
thinking I could use:

%a = client's IP address
%{remote}p = client's port

Would this pair uniquely identify one client connection, also for
clients behind one NAT router?

Then, in combination with the option %k this *might* be a solution to
"follow" users (of course, with some restrictions):

%k = Number of keepalive requests handled on this connection.
Interesting if KeepAlive is being used, so that, for example, a '1'
means the first keepalive request after the initial one, '2' the
second, etc...; otherwise this is always 0 (indicating the initial
request).

Would this be feasible? Any comments?

Thank you,
Andrej

Re: Apache log modules

Posted by Andrej van der Zee <an...@gmail.com>.
Hi Ted,

> This is much better done at the application level.

I am aware of that, but that is not an option unfortunately.

Cheers,
Andrej

Re: Apache log modules

Posted by Ted Dunning <te...@gmail.com>.
This is much better done at the application level.  Log a user id of some
kind (usually session cookie based).  Then sort by cookie first, then by
time.  The sorted file is what you need.  For really large logs, some kind
of map-reduce is warranted.

On Thu, Nov 25, 2010 at 4:39 PM, Andrej van der Zee <
andrejvanderzee@gmail.com> wrote:

>
> The scheme with the connection ID, can it work, or am I misjudging
> something completely? Are there alternatives?
>
>