You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apache-bugdb@apache.org by Marco Zamora <mz...@cbbanorte.com.mx> on 1997/11/05 02:00:21 UTC
mod_log-any/1358: Selective url-encode of log fields (or maybe a pseudo log_rewrite module?)
>Number: 1358
>Category: mod_log-any
>Synopsis: Selective url-encode of log fields (or maybe a pseudo log_rewrite module?)
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: apache
>State: open
>Class: change-request
>Submitter-Id: apache
>Arrival-Date: Tue Nov 4 17:10:00 PST 1997
>Last-Modified:
>Originator: mzamora@cbbanorte.com.mx
>Organization:
apache
>Release: 1.2x
>Environment:
Linux RedHat 4.1 kernel 2.0.27 on a PPro200Mhz
Apache 1.2b7 (I know, I'll upgrade to 1.24 as soon as I upgrade to RH4.3)
>Description:
Situation:
1) Common/Extended Log Format specify first line of request (and referring URL
for the ext. fmt.) *between* double quotes.
2) As per PR#859, we can deduce that spaces in requested URL should be the
*client's* problem, we can deduce that d-quotes in the URL are also
the client's problem
Problem Encountered:
We can't consistently parse an [EC]LF logfile either by whitespace delimiters
(where the URL would ideally be field #7), or by double-quote delimiters
(where the URL would be ws-delim subfield #2 in quote-enclosed field #6).
Diatribe:
Ok, ok...: URL-encoding of requests is the client's responsibility, but
parsing the #$%& logfiles of broken client's requests (especially in proxying
servers) turns into the admin's nightmare.
Have you ever parsed proxy logfiles of a bunch of people in one of those
web-chatrooms that do forms with GETs on quote-delimited searches?
(i.e.: you get a bunch of URLs with embedded spaces *and* quotes).
Try to identify the HTTP RESPONSE field in that mess in a consistent manner.
I guarantee you won't be able to.
>How-To-Repeat:
GET a series of arbitrary URLs with embedded spaces and double quotes.
Now, take the logfile and try to identify the METHOD and HTTP RESPONSE fields
without resorting to some sort of heuristic mumbo-jumbo (that eats up CPU
cycles and turns impractical with logfiles in the order of hundreds of
thousands of records a day).
>Fix:
Implement a url-encode field modifier for mod_log_config.
For example:
CustomLog logs/access_log "%h %l %u %t \"%r\" %s %b"
gives would give us the existing behaviour, but
CustomLog logs/access_log "%h %l %u %t \"%{Url-Enc %r}\" %s %b"
would url-encode the %r field.
(BTW, Apache is a *terriffic* work. You all have my eternal gratitude :-%2
>Audit-Trail:
>Unformatted: