You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Eric Covener <co...@gmail.com> on 2016/04/13 21:43:45 UTC

allow newlines in T_ESCAPE_LOGITEM?

Currently newlines get backslash-escaped if written to the errorlog.
This is via server/gen_test_char.c and stems from an ancient vuln
about escape sequences in log files potentially affecting peoples
terminals when cat'ed.

On a few occasions I have worked with some libraries that return a
newline-formatted buffer that I'd love to dump to the error log
without scanning and splitting up into multiple log entries.

I also occasionally see \n's from third-party mods once in a while.

Any opinions on punching a hole for \n in T_ESCAPE_LOGITEM?

RE: allow newlines in T_ESCAPE_LOGITEM?

Posted by Plüm, Rüdiger, Vodafone Group <ru...@vodafone.com>.

> -----Original Message-----
> From: Yann Ylavic [mailto:ylavic.dev@gmail.com]
> Sent: Mittwoch, 13. April 2016 23:16
> To: httpd-dev
> Subject: Re: allow newlines in T_ESCAPE_LOGITEM?
> 
> On Wed, Apr 13, 2016 at 11:08 PM, Eric Covener <co...@gmail.com> wrote:
> > On Wed, Apr 13, 2016 at 5:05 PM, Daniel Lescohier
> > <da...@cbsi.com> wrote:
> >> Isn't T_ESCAPE_LOGITEM also used by mod_log_config's use of
> >> ap_escape_logitem?  We rely on the API that data from HTTP requests
> that are
> >> logged in our mod_log_config logfiles are newline-escaped, so that one
> line
> >> in the logfile is parsed as one log entry.  Our parsers first split on
> >> newline to get records, then splits the fields of the record on the
> field
> >> delimiter to get fields, then it unescapes the backslash-escapes to get
> the
> >> original data for that field.
> >
> > You make a good point, it couldn't change and affect current callers
> > of ap_escape_logitem().
> 
> IMHO, even ErrorLog shouldn't contain splitted lines (w/o "[date]
> [level] [pid]" prefix).

+1 and -0.9 on not escaping \n.

Regards

Rüdiger

Re: allow newlines in T_ESCAPE_LOGITEM?

Posted by Daniel Lescohier <da...@cbsi.com>.
It's especially important when doing log processing on Apache Hadoop, if
you give uncompressed text files as input files to a Hadoop job, it'd split
large log files on newlines to be processed on multiple nodes.  That split
should be done on a record boundary.

On Wed, Apr 13, 2016 at 5:16 PM, Yann Ylavic <yl...@gmail.com> wrote:

> On Wed, Apr 13, 2016 at 11:08 PM, Eric Covener <co...@gmail.com> wrote:
> > On Wed, Apr 13, 2016 at 5:05 PM, Daniel Lescohier
> > <da...@cbsi.com> wrote:
> >> Isn't T_ESCAPE_LOGITEM also used by mod_log_config's use of
> >> ap_escape_logitem?  We rely on the API that data from HTTP requests
> that are
> >> logged in our mod_log_config logfiles are newline-escaped, so that one
> line
> >> in the logfile is parsed as one log entry.  Our parsers first split on
> >> newline to get records, then splits the fields of the record on the
> field
> >> delimiter to get fields, then it unescapes the backslash-escapes to get
> the
> >> original data for that field.
> >
> > You make a good point, it couldn't change and affect current callers
> > of ap_escape_logitem().
>
> IMHO, even ErrorLog shouldn't contain splitted lines (w/o "[date]
> [level] [pid]" prefix).
>

Re: allow newlines in T_ESCAPE_LOGITEM?

Posted by Yann Ylavic <yl...@gmail.com>.
On Wed, Apr 13, 2016 at 11:08 PM, Eric Covener <co...@gmail.com> wrote:
> On Wed, Apr 13, 2016 at 5:05 PM, Daniel Lescohier
> <da...@cbsi.com> wrote:
>> Isn't T_ESCAPE_LOGITEM also used by mod_log_config's use of
>> ap_escape_logitem?  We rely on the API that data from HTTP requests that are
>> logged in our mod_log_config logfiles are newline-escaped, so that one line
>> in the logfile is parsed as one log entry.  Our parsers first split on
>> newline to get records, then splits the fields of the record on the field
>> delimiter to get fields, then it unescapes the backslash-escapes to get the
>> original data for that field.
>
> You make a good point, it couldn't change and affect current callers
> of ap_escape_logitem().

IMHO, even ErrorLog shouldn't contain splitted lines (w/o "[date]
[level] [pid]" prefix).

Re: allow newlines in T_ESCAPE_LOGITEM?

Posted by Eric Covener <co...@gmail.com>.
On Wed, Apr 13, 2016 at 5:05 PM, Daniel Lescohier
<da...@cbsi.com> wrote:
> Isn't T_ESCAPE_LOGITEM also used by mod_log_config's use of
> ap_escape_logitem?  We rely on the API that data from HTTP requests that are
> logged in our mod_log_config logfiles are newline-escaped, so that one line
> in the logfile is parsed as one log entry.  Our parsers first split on
> newline to get records, then splits the fields of the record on the field
> delimiter to get fields, then it unescapes the backslash-escapes to get the
> original data for that field.

You make a good point, it couldn't change and affect current callers
of ap_escape_logitem().

Re: allow newlines in T_ESCAPE_LOGITEM?

Posted by Daniel Lescohier <da...@cbsi.com>.
Isn't T_ESCAPE_LOGITEM also used by mod_log_config's use of
ap_escape_logitem?  We rely on the API that data from HTTP requests that
are logged in our mod_log_config logfiles are newline-escaped, so that one
line in the logfile is parsed as one log entry.  Our parsers first split on
newline to get records, then splits the fields of the record on the field
delimiter to get fields, then it unescapes the backslash-escapes to get the
original data for that field.

On Wed, Apr 13, 2016 at 3:43 PM, Eric Covener <co...@gmail.com> wrote:

> Currently newlines get backslash-escaped if written to the errorlog.
> This is via server/gen_test_char.c and stems from an ancient vuln
> about escape sequences in log files potentially affecting peoples
> terminals when cat'ed.
>
> On a few occasions I have worked with some libraries that return a
> newline-formatted buffer that I'd love to dump to the error log
> without scanning and splitting up into multiple log entries.
>
> I also occasionally see \n's from third-party mods once in a while.
>
> Any opinions on punching a hole for \n in T_ESCAPE_LOGITEM?
>

Re: allow newlines in T_ESCAPE_LOGITEM?

Posted by Daniel Lescohier <da...@cbsi.com>.
Of course, if you use "reliable piped logging" then if you also write more
than PIPE_BUF bytes (4kiB on Linux) in a log line, it's not guaranteed to
be atomic.

I have been thinking of how to work around that limit, by logging to an
Apache Flume Thrift RPC port.  But to avoid the 4kiB limit you'd have to
put the Thrift client inside the httpd server.  You'd really want to have a
dedicated thread in each child process for the Thrift client, so that it
can accumulate a batch of messages for the Flume server: otherwise, you'd
be putting a lot of transaction load on Apache Flume.  So the httpd threads
would put their log lines on a synchronized queue consumed by the logging
thread.  Each child process would have it's own TCP connection to Apache
Flume, so there's none of the atomic message issues you have with >
PIPE_BUF messages using a Pipe.

It'd also be nice if Apache Thrift had a c_apr target in addition to the
c_glib target, so that you don't have to bring in glib to use it.

Anyway, it seemed to be some work to implement, so I think I will just live
with the limit of 4kiB atomic log entries.

Here is the Flume Thrift protocol:

namespace java org.apache.flume.thrift

struct ThriftFlumeEvent {
  1: required map <string, string> headers,
  2: required binary body,
}

enum Status {
  OK,
  FAILED,
  ERROR,
  UNKNOWN
}

service ThriftSourceProtocol {
  Status append(1: ThriftFlumeEvent event),
  Status appendBatch(1: list<ThriftFlumeEvent> events),
}


On Mon, Apr 18, 2016 at 4:37 PM, William A Rowe Jr <wr...@rowe-clan.net>
wrote:

> That doesn't work because it's no longer atomic, those log lines end up
> fanned out among all the other worker threads doing the same thing.  Very
> hard to parse.
>
> On Mon, Apr 18, 2016 at 3:08 PM, Stefan Fritsch <sf...@sfritsch.de> wrote:
>
>> On Monday 18 April 2016 10:42:23, Eric Covener wrote:
>> > I'm won over and not tinkering with this FWIW.
>>
>> Maybe add a utility function that splits a message on \n and calls
>> ap_log_rerror for each line?
>>
>
>

Re: allow newlines in T_ESCAPE_LOGITEM?

Posted by William A Rowe Jr <wr...@rowe-clan.net>.
That doesn't work because it's no longer atomic, those log lines end up
fanned out among all the other worker threads doing the same thing.  Very
hard to parse.

On Mon, Apr 18, 2016 at 3:08 PM, Stefan Fritsch <sf...@sfritsch.de> wrote:

> On Monday 18 April 2016 10:42:23, Eric Covener wrote:
> > I'm won over and not tinkering with this FWIW.
>
> Maybe add a utility function that splits a message on \n and calls
> ap_log_rerror for each line?
>

Re: allow newlines in T_ESCAPE_LOGITEM?

Posted by Stefan Fritsch <sf...@sfritsch.de>.
On Monday 18 April 2016 10:42:23, Eric Covener wrote:
> I'm won over and not tinkering with this FWIW.

Maybe add a utility function that splits a message on \n and calls 
ap_log_rerror for each line?

Re: allow newlines in T_ESCAPE_LOGITEM?

Posted by Eric Covener <co...@gmail.com>.
I'm won over and not tinkering with this FWIW.

Re: allow newlines in T_ESCAPE_LOGITEM?

Posted by Daniel Lescohier <da...@cbsi.com>.
Doing this in the httpd server will prevent use-cases of programmatically
parsing the log files.  I think the data in the file should support both
use-cases

   - Parsing the log file data for data analysis.
   - Humans reading the logs.

So, the raw data should support both use-cases.

For a human reading the logs, that should be handled by an application that
parses the raw data and presents it in human readable format.  Here is a
simple application:

#!/usr/bin/python
import sys, fileinput
write = sys.stdout.write
separator = '-'*78 + '\n'
for line in fileinput.input():
    lines = line.decode("string_escape")
    write(lines)
    if lines.count('\n') > 1:
        write(separator)

You could also have an application with much more functionality,
color-coding fields, using underline to separate records instead of dashes,
whatever you want.

The above application may not meet exactly what the OP wants: but that's my
point: each user's use-case is different, and the raw data should support
all the use-cases.



On Mon, Apr 18, 2016 at 8:28 AM, Stefan Eissing <
stefan.eissing@greenbytes.de> wrote:

> Would it make sense to replace \n with \n\t?
>
> > Am 18.04.2016 um 14:23 schrieb Jim Jagielski <ji...@jaguNET.com>:
> >
> > No opinion :)
> >
> >> On Apr 13, 2016, at 3:43 PM, Eric Covener <co...@gmail.com> wrote:
> >>
> >> Currently newlines get backslash-escaped if written to the errorlog.
> >> This is via server/gen_test_char.c and stems from an ancient vuln
> >> about escape sequences in log files potentially affecting peoples
> >> terminals when cat'ed.
> >>
> >> On a few occasions I have worked with some libraries that return a
> >> newline-formatted buffer that I'd love to dump to the error log
> >> without scanning and splitting up into multiple log entries.
> >>
> >> I also occasionally see \n's from third-party mods once in a while.
> >>
> >> Any opinions on punching a hole for \n in T_ESCAPE_LOGITEM?
> >
>
>

Re: allow newlines in T_ESCAPE_LOGITEM?

Posted by Stefan Eissing <st...@greenbytes.de>.
Would it make sense to replace \n with \n\t?

> Am 18.04.2016 um 14:23 schrieb Jim Jagielski <ji...@jaguNET.com>:
> 
> No opinion :)
> 
>> On Apr 13, 2016, at 3:43 PM, Eric Covener <co...@gmail.com> wrote:
>> 
>> Currently newlines get backslash-escaped if written to the errorlog.
>> This is via server/gen_test_char.c and stems from an ancient vuln
>> about escape sequences in log files potentially affecting peoples
>> terminals when cat'ed.
>> 
>> On a few occasions I have worked with some libraries that return a
>> newline-formatted buffer that I'd love to dump to the error log
>> without scanning and splitting up into multiple log entries.
>> 
>> I also occasionally see \n's from third-party mods once in a while.
>> 
>> Any opinions on punching a hole for \n in T_ESCAPE_LOGITEM?
> 


Re: allow newlines in T_ESCAPE_LOGITEM?

Posted by Jim Jagielski <ji...@jaguNET.com>.
No opinion :)

> On Apr 13, 2016, at 3:43 PM, Eric Covener <co...@gmail.com> wrote:
> 
> Currently newlines get backslash-escaped if written to the errorlog.
> This is via server/gen_test_char.c and stems from an ancient vuln
> about escape sequences in log files potentially affecting peoples
> terminals when cat'ed.
> 
> On a few occasions I have worked with some libraries that return a
> newline-formatted buffer that I'd love to dump to the error log
> without scanning and splitting up into multiple log entries.
> 
> I also occasionally see \n's from third-party mods once in a while.
> 
> Any opinions on punching a hole for \n in T_ESCAPE_LOGITEM?