You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modules-dev@httpd.apache.org by Andrej van der Zee <an...@gmail.com> on 2008/09/05 05:11:39 UTC

Apache modification questions

Hi,

I am about to modify Apache with some custom logging for GET/POST
requests (and more). It is for the purpose of research. If possible, I
would like to get some guidance in how to implement my ideas. I will
explain...

Every GET/POST request to Apache will carry a request identifier.
Adding the identifier to the request is the responsibility of the
developer of the web page. In Apache (NOT the CGI application)  I
would like to extract the identifier from the request and write it to
a log together with timestamp, request serve time and some specific
information about the thread/process that handles the request. The CGI
application serving the request should be untouched! I have two issues
I would like to get some comments about, if possible:

1) What is the "recommended" way to carry the GET/POST request
identifier (inserted by the developer of the web page) from the client
to Apache? Add a custom HTTP header? Or should I do it in GET/POST
variables? Any other alternatives?
2) I do need to "attach" to the thread/process handling the request to
extract information just after starting and just before ending. Can I
do this in an Apache module? I found the ap_hook_child_init() function
but no similar exit()-function. Moreover, I need to access the request
identifier and log to a file. Can all this be done in an Apache
module?

Hope you can help!

Cheers,
Andrej

-- 
Andrej van der Zee
2-40-19 Koenji-minami
Suginami-ku, Tokyo
166-0003 JAPAN
Mobile: 0031-(0)80-65251092
Phone/Fax: 0031-(0)3-3318-3155

Re: Apache modification questions

Posted by Ralf Mattes <rm...@seid-online.de>.
On Fri, 2008-09-05 at 18:49 +0900, Andrej van der Zee wrote:
> Hi,
> 
> >
> > A CGI script is run by the code of a module, mod_perl, mod_php5, etc.
> > In their register_hooks function they register their handler, which is
> > a script interpreter basically. The handler callback is invoked in the
> > same thread that did the rest of the request processing (URL parsing,
> > authentication, fixups, etc). 

I think there seems to be a confusion about terminology here: a
CGI-script by definition runs in its own process, _not_ within the
process address space of the webserver (hence the need for a defined 
"Common Gateway Interface" to pass request information to the external
application. See http://en.wikipedia.org/wiki/Common_Gateway_Interface).
Now, since the forking/spanwing is a rather time consuming operation
there are embedded interpreters for most of the commons scripting
languages (like perl/python etc.). Those often have a compatibility
layer to run unmodified CGI code within the webservers address/process
space (with sometimes strage side efects :-/).
 
> However, I don't know if the handler
> > callback (of mod_perl, mod_php5, etc), which can be seen as a sort of
> > third-party black box, spawns new processes/threads in which they
> > parse the script, compile, etc. I guess they do not spawn new
> > threads/processes but you have to read their docs or their sources in
> > order to be sure. If they do not spawn new threads/processes, then the
> > CGI is executed in the same thread as fixups and the rest of the
> > request processing.
> >
> 
> Thanks that makes sense.
> 
> If understood correctly, this means that I can add my own module to
> the chain of request processors that executes in the same thread as
> the hook function in mod_php5/mod_perl that executes CGI scripts.

For the embedded interpreters that's true.

> Though, if the module's hook function spawns a new process/thread for
> handling the CGI script is dependent on the module.

CGIs are handled by mod_cgi.

> Does anybody know if the hook functions of such modules usually
> spawning a new thread/process? My guess is that at least for compiled
> CGI application written in C/C++ a new process is forked in the hook
> function.

Iff those are real CGIs (applications) that's true.

 HTH Ralf Mattes

> Cheers,
> Andrej
> 
> 


Re: Apache modification questions

Posted by Dave Ingram <da...@dmi.me.uk>.
Andrej van der Zee wrote:
> Does anybody know if the hook functions of such modules usually
> spawning a new thread/process? My guess is that at least for compiled
> CGI application written in C/C++ a new process is forked in the hook
> function.
>   

This is just off the top of my head, so I have no solid proof for this,
and please correct me if I'm wrong.

I would think that the reason for having an Apache module for PHP/Perl
would be to avoid the overhead of initialising the interpreter for every
page - it initialises itself once at Apache start, and then resets its
state for each request. Or so I would hope. Using a module also gives
them access to some Apache internals that CGI applications can't reach.
CGI applications (including mod_suphp and PHP/Perl run as CGI scripts)
would be forked.


Dave

Re: Apache modification questions

Posted by Andrej van der Zee <an...@gmail.com>.
Hi,

>
> A CGI script is run by the code of a module, mod_perl, mod_php5, etc.
> In their register_hooks function they register their handler, which is
> a script interpreter basically. The handler callback is invoked in the
> same thread that did the rest of the request processing (URL parsing,
> authentication, fixups, etc). However, I don't know if the handler
> callback (of mod_perl, mod_php5, etc), which can be seen as a sort of
> third-party black box, spawns new processes/threads in which they
> parse the script, compile, etc. I guess they do not spawn new
> threads/processes but you have to read their docs or their sources in
> order to be sure. If they do not spawn new threads/processes, then the
> CGI is executed in the same thread as fixups and the rest of the
> request processing.
>

Thanks that makes sense.

If understood correctly, this means that I can add my own module to
the chain of request processors that executes in the same thread as
the hook function in mod_php5/mod_perl that executes CGI scripts.
Though, if the module's hook function spawns a new process/thread for
handling the CGI script is dependent on the module.

Does anybody know if the hook functions of such modules usually
spawning a new thread/process? My guess is that at least for compiled
CGI application written in C/C++ a new process is forked in the hook
function.

Cheers,
Andrej


-- 
Andrej van der Zee
2-40-19 Koenji-minami
Suginami-ku, Tokyo
166-0003 JAPAN
Mobile: 0031-(0)80-65251092
Phone/Fax: 0031-(0)3-3318-3155

Re: Apache modification questions

Posted by Sorin Manolache <so...@gmail.com>.
On Fri, Sep 5, 2008 at 11:19, Andrej van der Zee
<an...@gmail.com> wrote:
> Hi,
>
> Thanks for your comments.
>
>>
>> child_init is not the appropriate hook for your purpose. Use
>> ap_hook_fixups for getting the ID and ap_hook_log_transaction for
>> logging.
>
> In ap_hook_fixups, is it possible to get the thread/process ID of the
> CGI application serving the request? Moreover, is this thread/process
> already created? Or maybe the hook function is executed in the same
> thread/process as the CGI application?

A CGI script is run by the code of a module, mod_perl, mod_php5, etc.
In their register_hooks function they register their handler, which is
a script interpreter basically. The handler callback is invoked in the
same thread that did the rest of the request processing (URL parsing,
authentication, fixups, etc). However, I don't know if the handler
callback (of mod_perl, mod_php5, etc), which can be seen as a sort of
third-party black box, spawns new processes/threads in which they
parse the script, compile, etc. I guess they do not spawn new
threads/processes but you have to read their docs or their sources in
order to be sure. If they do not spawn new threads/processes, then the
CGI is executed in the same thread as fixups and the rest of the
request processing.

S

Re: Apache modification questions

Posted by Andrej van der Zee <an...@gmail.com>.
Hi,

Thanks for your comments.

>
> child_init is not the appropriate hook for your purpose. Use
> ap_hook_fixups for getting the ID and ap_hook_log_transaction for
> logging.

In ap_hook_fixups, is it possible to get the thread/process ID of the
CGI application serving the request? Moreover, is this thread/process
already created? Or maybe the hook function is executed in the same
thread/process as the CGI application?

>
> Every module has a "register_hooks" function. There, you call the two
> ap_hook functions above in order to hook your callbacks to the fixups
> and log_transaction events. Next you implement the two callbacks and
> that's it.
>

That's clear.

Cheers,
Andrej


-- 
Andrej van der Zee
2-40-19 Koenji-minami
Suginami-ku, Tokyo
166-0003 JAPAN
Mobile: 0031-(0)80-65251092
Phone/Fax: 0031-(0)3-3318-3155

Re: Apache modification questions

Posted by Sorin Manolache <so...@gmail.com>.
On Fri, Sep 5, 2008 at 05:11, Andrej van der Zee
<an...@gmail.com> wrote:
> 2) I do need to "attach" to the thread/process handling the request to
> extract information just after starting and just before ending. Can I
> do this in an Apache module? I found the ap_hook_child_init() function
> but no similar exit()-function. Moreover, I need to access the request
> identifier and log to a file. Can all this be done in an Apache
> module?

child_init is not the appropriate hook for your purpose. Use
ap_hook_fixups for getting the ID and ap_hook_log_transaction for
logging.

Every module has a "register_hooks" function. There, you call the two
ap_hook functions above in order to hook your callbacks to the fixups
and log_transaction events. Next you implement the two callbacks and
that's it.

S

Re: Apache modification questions

Posted by Andrej van der Zee <an...@gmail.com>.
Thanks for your comments.

Until I get the book, can you tell if a module's hook function can
execute in the same thread as the CGI application that serves the
request?

Also, I am unable to find the apache2 API for building modules. I
found some documentation for developers on the apache2 website, but
the link for "Autogenerated Apache 2 code documentation" is not
working.

Thank you,
Andrej



-- 
Andrej van der Zee
2-40-19 Koenji-minami
Suginami-ku, Tokyo
166-0003 JAPAN
Mobile: 0031-(0)80-65251092
Phone/Fax: 0031-(0)3-3318-3155

Re: Apache modification questions

Posted by Andrej van der Zee <an...@gmail.com>.
Hi,

  A post doesn't normally have anything in the
> QUERY_STRING.  Rather a POSTed form has the stuff
> being sent in the request body, which is read from
> STDIN by a script.  QUERY_STRING is still available
> for use.  Consider this form:
>
> form method=post action=myscript.cgi
> input name=emailaddress
>
>  The "emailaddress" input would NOT appear
> in QUERY_STRING, but rather would appear to
> the script as STDIN, or to another module as
> the request body.  You can still do:
>
> form method=post action=myscript.cgi?id=reallyunique
> input name=emailaddress



Uhhhh yeah you are right, I am not thinking.

Cheers,
Andrej

Re: Apache modification questions

Posted by Andrej van der Zee <an...@gmail.com>.
Hi,


> > 1) What is the "recommended" way to carry the
> > GET/POST request identifier (inserted by the
> > developer of the web page) from the client
> > to Apache?
>
>    You're not going to get the browser to send
> a custom header by any changes you make to the
> page, except possibly using Ajax.  Javascript isn't
> enabled on all browsers, so that's out for a
> public web site.  Cookies also are often turned
> off, so that's out, if it's a public site.
> That leaves the QUERY_STRING, mainly.  (What
> you're calling GET variables).  We have another
> way that's better for many sites, but there are
> patent issues with that method.  (My company
> is seeking a patent.)


Now I am curious, but I bet you are not going to tell me!


> Using the query string
> means rewriting the content of each page, so
> one could do what PHP does and use cookies if
> they are available, then failover to query string
> if cookies are not sent by the browser.
>
>
This is a bit awkward since the QUERY_STRING in POST can be anything, for
example an XML-documents. I wonder why it is impossible to add a HTTP
Request Header in Javascript. Does anybody know? With AJAX and ActionScript
(or Flex 3) it can be done with a few lines of code. Anyway, I guess I could
write an Apache module with a hook-method that attemps to get the identifier
from

1) Custom HTTP request header
2) A Cookie
3) QUERY_STRING

In this order and DECLINE if not found. Though, how should I deal with, for
example, an XML-document in a POST requests? Where should I "hide" the
transaction identifier?

Best regards,
Andrej