You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apache-bugdb@apache.org by Marco Zamora <mz...@cbbanorte.com.mx> on 1997/11/05 02:00:21 UTC

mod_log-any/1358: Selective url-encode of log fields (or maybe a pseudo log_rewrite module?)

>Number:         1358
>Category:       mod_log-any
>Synopsis:       Selective url-encode of log fields (or maybe a pseudo log_rewrite module?)
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    apache
>State:          open
>Class:          change-request
>Submitter-Id:   apache
>Arrival-Date:   Tue Nov  4 17:10:00 PST 1997
>Last-Modified:
>Originator:     mzamora@cbbanorte.com.mx
>Organization:
apache
>Release:        1.2x
>Environment:
Linux RedHat 4.1 kernel 2.0.27 on a PPro200Mhz
Apache 1.2b7 (I know, I'll upgrade to 1.24 as soon as I upgrade to RH4.3)
>Description:
Situation: 
  1) Common/Extended Log Format specify first line of request (and referring URL 
     for the ext. fmt.) *between* double quotes.
  2) As per PR#859, we can deduce that spaces in requested URL should be the
     *client's* problem, we can deduce that d-quotes in the URL are also
     the client's problem
Problem Encountered:
  We can't consistently parse an [EC]LF logfile either by whitespace delimiters
  (where the URL would ideally be field #7), or by double-quote delimiters
  (where the URL would be ws-delim subfield #2 in quote-enclosed field #6).
Diatribe:
  Ok, ok...: URL-encoding of requests is the client's responsibility, but
  parsing the #$%& logfiles of broken client's requests (especially in proxying
  servers) turns into the admin's nightmare.
  Have you ever parsed proxy logfiles of a bunch of people in one of those 
  web-chatrooms that do forms with GETs on quote-delimited searches?
  (i.e.: you get a bunch of URLs with embedded spaces *and* quotes).
  Try to identify the HTTP RESPONSE field in that mess in a consistent manner.
  I guarantee you won't be able to.
>How-To-Repeat:
GET a series of arbitrary URLs with embedded spaces and double quotes.
Now, take the logfile and try to identify the METHOD and HTTP RESPONSE fields 
without resorting to some sort of heuristic mumbo-jumbo (that eats up CPU 
cycles and turns impractical with logfiles in the order of hundreds of 
thousands of records a day).
>Fix:
Implement a url-encode field modifier for mod_log_config.
For example:
  CustomLog logs/access_log "%h %l %u %t \"%r\" %s %b"
gives would give us the existing behaviour, but
  CustomLog logs/access_log "%h %l %u %t \"%{Url-Enc %r}\" %s %b"
would url-encode the %r field.

(BTW, Apache is a *terriffic* work. You all have my eternal gratitude :-%2
>Audit-Trail:
>Unformatted: