You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Rainer Jung <ra...@kippdata.de> on 2010/11/08 16:16:32 UTC

Objections about formatting options to time stamp in access log

A typical annoyance when combining Apache web server and Tomcat is the 
difference in access log timestamp. Apache logs the beginning of the 
request, Tomcat logs the end of the request.

I added a feature to Apache trunk (will become 2.4) to make it 
configurable for Apache, which time stamp to choose ([1]). Furthermore 
the Apache web server allows to choose the format, in which the time 
stamp is being logged as something different than just "Common Log 
Format". This is all configured by an appropriate "format" in %{format}t.

Any a priori objections about porting the same functionality to Tomcat 
trunk?

Of course due to latency between Apache and Tomcat and due to different 
timeouts, even when using the same time stamp (begin or end of request) 
in both products, the timestamps will not always match exactly. But at 
the moment matching the entries is much harder, than it could be.

Regards,

Rainer

[1] http://httpd.apache.org/docs/trunk/mod/mod_log_config.html#formats

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Objections about formatting options to time stamp in access log

Posted by Konstantin Kolinko <kn...@gmail.com>.
2010/11/11 Rainer Jung <ra...@kippdata.de>:
> On 10.11.2010 04:38, sebb wrote:
>>
>> The private static fields timezone and timeZone[No]DST are not final,
>> but appear to be shared between threads.
>> The method that creates them is synchronised, but methods that read
>> them don't appear to be.
>> It would be safer to make these fields final and create them in a
>> static block (or equivalent).
>
> Thanks, done. Updated patch at:
>
> http://people.apache.org/~rjung/patches/tomcat-trunk-accesslog-extended-timestamp-v2.patch
>

Looks interesting.

> buf.append(Long.toString(...))

won't produce leading zeros

> The addElement() in the interface isn't very helpful.

If there is a need to change the AccessLogElement interface, I think
that can be done. We already changed it between TC6 and TC7
(StringBuffer -> StringBuilder).


Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Objections about formatting options to time stamp in access log

Posted by Rainer Jung <ra...@kippdata.de>.
On 10.11.2010 04:38, sebb wrote:
> The private static fields timezone and timeZone[No]DST are not final,
> but appear to be shared between threads.
> The method that creates them is synchronised, but methods that read
> them don't appear to be.
> It would be safer to make these fields final and create them in a
> static block (or equivalent).

Thanks, done. Updated patch at:

http://people.apache.org/~rjung/patches/tomcat-trunk-accesslog-extended-timestamp-v2.patch

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Objections about formatting options to time stamp in access log

Posted by sebb <se...@gmail.com>.
On 9 November 2010 21:35, Rainer Jung <ra...@kippdata.de> wrote:
> On 08.11.2010 16:16, Rainer Jung wrote:
>>
>> A typical annoyance when combining Apache web server and Tomcat is the
>> difference in access log timestamp. Apache logs the beginning of the
>> request, Tomcat logs the end of the request.
>>
>> I added a feature to Apache trunk (will become 2.4) to make it
>> configurable for Apache, which time stamp to choose ([1]). Furthermore
>> the Apache web server allows to choose the format, in which the time
>> stamp is being logged as something different than just "Common Log
>> Format". This is all configured by an appropriate "format" in %{format}t.
>
> ...
>>
>> [1] http://httpd.apache.org/docs/trunk/mod/mod_log_config.html#formats
>
> A first draft patch is availabel for review at
>
> http://people.apache.org/~rjung/patches/tomcat-trunk-accesslog-extended-timestamp.patch
>
> Status
> ======
>
> - docs are still missing
> - tested a bit also under load
> - not yet tested with multiple instances of the access log

The private static fields timezone and timeZone[No]DST are not final,
but appear to be shared between threads.
The method that creates them is synchronised, but methods that read
them don't appear to be.
It would be safer to make these fields final and create them in a
static block (or equivalent).

> - the aspect of async request handling is not clear to me.
>  There is request.getCoyoteRequest().getStartTime() and
>  request.setAttribute(t1Name, new Long(t1)) and I haven't
>  checked the relation between the two at the time of logging.
>
>
> What works
> ==========
>
> - %t still loggs response end time in Common Log Format
> - %t{FORMAT} logs in configurable format:
>  - FORMAT could be either "begin" or "end" or have a prefix of
>    "begin:" or "end:", which means "use the request start time"
>    respectively "use the response end time".
>  - stripping the optional prefix the remaining format can be either:
>    - one of the special tokens "sec" (seconds since epoch), "msec"
>      (milliseconds since epoch) or "msec_frac" (only sub second
>      milliseconds)
>    - any other remaining format string will be passed to
>      SimpleDateFormat
>
> Implementation Design
> =====================
>
> Common Log Format formats and SimpleDateFormat formats are cached (the other
> special formats are trivial to evaluate). Each different format
> specification occuring in the pattern has its own cache (and
> SimpleDateFormat).
>
> The cache is organized in two levels. A ThreadLocal caches the format for 60
> consecutive timestamps, a global synchronized cache for 300 consecutive
> timestamps. The design is more important than in the original implementation
> because request start times can jump more.
>
> All data is be consistent, i.e. begin times, end times and duration fit
> together even if multiple %{FORMAT}t entries are used in the log pattern.
>
> Finally if a general format string is used, which gets fed into
> SimpleDateFormat, I remove any "S" (milliseconds) and replace it with a
> marker string. This is necessary to allow the cache to be effective.
> Replacing the marker with the milliseconds is a trivial post-formatting tas
> and doesn't need complex calendar objects (I know there are leap seconds,
> but well ...).
>
> Performance Implications
> ========================
>
> I did a few quick stress tests on a two CPU Sparc Server with a trivial one
> byte plus no sessions JSP. There was no clear difference between the old and
> the new implementation (below 5%, which one was faster was not stable,
> around 7.500 requests per second for default config).
>
> Observations
> ============
>
> The addElement() in the interface isn't very helpful. The date and time
> arguments are very special for the DateAndTime and ElapsedTime formatter. I
> guess it would be better to have everything needed in request and response
> or pass a mre generic data object that contains additional meta data. I
> didn't change any of the interfaces though, because it might not be to
> unlikely that people have actually extended existing AccessLog
> implementations.
>
> I can't work on it during the next two days, but there will be some time
> during the weekend to proceed.
>
> Regards,
>
> Rainer
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Objections about formatting options to time stamp in access log

Posted by Rainer Jung <ra...@kippdata.de>.
On 08.11.2010 16:16, Rainer Jung wrote:
> A typical annoyance when combining Apache web server and Tomcat is the
> difference in access log timestamp. Apache logs the beginning of the
> request, Tomcat logs the end of the request.
>
> I added a feature to Apache trunk (will become 2.4) to make it
> configurable for Apache, which time stamp to choose ([1]). Furthermore
> the Apache web server allows to choose the format, in which the time
> stamp is being logged as something different than just "Common Log
> Format". This is all configured by an appropriate "format" in %{format}t.
...
>
> [1] http://httpd.apache.org/docs/trunk/mod/mod_log_config.html#formats

A first draft patch is availabel for review at

http://people.apache.org/~rjung/patches/tomcat-trunk-accesslog-extended-timestamp.patch

Status
======

- docs are still missing
- tested a bit also under load
- not yet tested with multiple instances of the access log
- the aspect of async request handling is not clear to me.
   There is request.getCoyoteRequest().getStartTime() and
   request.setAttribute(t1Name, new Long(t1)) and I haven't
   checked the relation between the two at the time of logging.


What works
==========

- %t still loggs response end time in Common Log Format
- %t{FORMAT} logs in configurable format:
   - FORMAT could be either "begin" or "end" or have a prefix of
     "begin:" or "end:", which means "use the request start time"
     respectively "use the response end time".
   - stripping the optional prefix the remaining format can be either:
     - one of the special tokens "sec" (seconds since epoch), "msec"
       (milliseconds since epoch) or "msec_frac" (only sub second
       milliseconds)
     - any other remaining format string will be passed to
       SimpleDateFormat

Implementation Design
=====================

Common Log Format formats and SimpleDateFormat formats are cached (the 
other special formats are trivial to evaluate). Each different format 
specification occuring in the pattern has its own cache (and 
SimpleDateFormat).

The cache is organized in two levels. A ThreadLocal caches the format 
for 60 consecutive timestamps, a global synchronized cache for 300 
consecutive timestamps. The design is more important than in the 
original implementation because request start times can jump more.

All data is be consistent, i.e. begin times, end times and duration fit 
together even if multiple %{FORMAT}t entries are used in the log pattern.

Finally if a general format string is used, which gets fed into 
SimpleDateFormat, I remove any "S" (milliseconds) and replace it with a 
marker string. This is necessary to allow the cache to be effective. 
Replacing the marker with the milliseconds is a trivial post-formatting 
tas and doesn't need complex calendar objects (I know there are leap 
seconds, but well ...).

Performance Implications
========================

I did a few quick stress tests on a two CPU Sparc Server with a trivial 
one byte plus no sessions JSP. There was no clear difference between the 
old and the new implementation (below 5%, which one was faster was not 
stable, around 7.500 requests per second for default config).

Observations
============

The addElement() in the interface isn't very helpful. The date and time 
arguments are very special for the DateAndTime and ElapsedTime 
formatter. I guess it would be better to have everything needed in 
request and response or pass a mre generic data object that contains 
additional meta data. I didn't change any of the interfaces though, 
because it might not be to unlikely that people have actually extended 
existing AccessLog implementations.

I can't work on it during the next two days, but there will be some time 
during the weekend to proceed.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Objections about formatting options to time stamp in access log

Posted by Rainer Jung <ra...@kippdata.de>.
On 08.11.2010 22:04, Rainer Jung wrote:
> On 08.11.2010 21:25, Konstantin Kolinko wrote:
>> My current idea with implementation is that the timestamp when request
>> processing started can be placed in a field in the request. Such
>> solution will be compatible with the recently added AccessLog
>> interface.

It's actually already there:

request.getCoyoteRequest().getStartTime();

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Objections about formatting options to time stamp in access log

Posted by Rainer Jung <ra...@kippdata.de>.
On 08.11.2010 21:25, Konstantin Kolinko wrote:
> 2010/11/8 Rainer Jung<ra...@kippdata.de>:
>> A typical annoyance when combining Apache web server and Tomcat is the
>> difference in access log timestamp. Apache logs the beginning of the
>> request, Tomcat logs the end of the request.
>>
>
> According to the HTTPD docs [2], this changed between HTTPD 1.3 and 2.
> 1.3 logs the time when the request processing finished, 2+ logs when it started.
>
> [2] http://httpd.apache.org/docs/1.3/logs.html

OK, but 1.3 isn't really relevant any longer ...

>> I added a feature to Apache trunk (will become 2.4) to make it configurable
>> for Apache, which time stamp to choose ([1]). Furthermore the Apache web
>> server allows to choose the format, in which the time stamp is being logged
>> as something different than just "Common Log Format". This is all configured
>> by an appropriate "format" in %{format}t.
>>
>> Any a priori objections about porting the same functionality to Tomcat
>> trunk?
>>
>
> I thought about it already. I would like this feature.
>
> My current idea with implementation is that the timestamp when request
> processing started can be placed in a field in the request. Such
> solution will be compatible with the recently added AccessLog
> interface.

Since any single invocation of addElement() only needs either begin or 
end time, I was thinking about deciding which timestamp to pass when 
calling addElement(). But it might be a bit faster and more generic to 
add the timestamp to the request and let the AccessLogElement retrieve 
it from there.

> There might be a caveat with caching of formatted timestamp strings in
> the access log valve implementations.

Yes, but the momentary implementation isn't that nice, e.g. if you want 
correct sub second resolution. I'm experimenting with something similar 
to httpd, namely using a cyclic buffer of N formatted timestamps 
(seconds granularity). That buffer could be thread local (like the 
currentDateStruct right now). If a new timestamp is not contained in the 
buffer, it could call out to a synchronized global buffer which then 
does the actual formatting. That way there will not be to much locking 
(only if a thread sees a second not in its local cache) and the 
formatting is mostly only done once for each second.

> What happens with async requests?

Good question, I need to see, what the momentary behaviour of the access 
log is with respect to the async lifecycle.

>> Of course due to latency between Apache and Tomcat and due to different
>> timeouts, even when using the same time stamp (begin or end of request) in
>> both products, the timestamps will not always match exactly. But at the
>> moment matching the entries is much harder, than it could be.
>>
>
> If one needs to match them it would be better to do so by certain id,
> not by a timestamp.
>
> Something like %{forensic-id}n  [3]
>
> [3] http://httpd.apache.org/docs/current/mod/mod_log_forensic.html

Yes! There is mod_unique_id, which generates unique IDs also for 
mod_log_forensic. One can set those as additional request headers and 
that way pass them to Tomcat and log there. I was thinking about a 
simple filter, that

- looks for an incoming ID by checking a configured header

- if the header is not set generates an own ID basically using an atomic

- Checks the presence of a few well known log frameworks and sets the ID 
as a mapped diagnostic context (log4j terminology) or similar so that 
you can configure your webapp logging and in case of log4j also Tomcat 
logging to include the unique ID on each log line.

- optionally reflects the ID by setting a response header, which allows 
to log it on the proxy and also to check its correctness on the proxy.

But first the access log ...

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Objections about formatting options to time stamp in access log

Posted by Konstantin Kolinko <kn...@gmail.com>.
2010/11/8 Rainer Jung <ra...@kippdata.de>:
> A typical annoyance when combining Apache web server and Tomcat is the
> difference in access log timestamp. Apache logs the beginning of the
> request, Tomcat logs the end of the request.
>

According to the HTTPD docs [2], this changed between HTTPD 1.3 and 2.
1.3 logs the time when the request processing finished, 2+ logs when it started.

[2] http://httpd.apache.org/docs/1.3/logs.html

> I added a feature to Apache trunk (will become 2.4) to make it configurable
> for Apache, which time stamp to choose ([1]). Furthermore the Apache web
> server allows to choose the format, in which the time stamp is being logged
> as something different than just "Common Log Format". This is all configured
> by an appropriate "format" in %{format}t.
>
> Any a priori objections about porting the same functionality to Tomcat
> trunk?
>

I thought about it already. I would like this feature.

My current idea with implementation is that the timestamp when request
processing started can be placed in a field in the request. Such
solution will be compatible with the recently added AccessLog
interface.

There might be a caveat with caching of formatted timestamp strings in
the access log valve implementations. What happens with async
requests?

> Of course due to latency between Apache and Tomcat and due to different
> timeouts, even when using the same time stamp (begin or end of request) in
> both products, the timestamps will not always match exactly. But at the
> moment matching the entries is much harder, than it could be.
>

If one needs to match them it would be better to do so by certain id,
not by a timestamp.

Something like %{forensic-id}n  [3]

[3] http://httpd.apache.org/docs/current/mod/mod_log_forensic.html

> Regards,
>
> Rainer
>
> [1] http://httpd.apache.org/docs/trunk/mod/mod_log_config.html#formats
>

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Objections about formatting options to time stamp in access log

Posted by Mark Thomas <ma...@apache.org>.
On 08/11/2010 15:16, Rainer Jung wrote:
> A typical annoyance when combining Apache web server and Tomcat is the
> difference in access log timestamp. Apache logs the beginning of the
> request, Tomcat logs the end of the request.
> 
> I added a feature to Apache trunk (will become 2.4) to make it
> configurable for Apache, which time stamp to choose ([1]). Furthermore
> the Apache web server allows to choose the format, in which the time
> stamp is being logged as something different than just "Common Log
> Format". This is all configured by an appropriate "format" in %{format}t.
> 
> Any a priori objections about porting the same functionality to Tomcat
> trunk?

None here. I'd find it useful.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org