You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4j-dev@logging.apache.org by Curt Arnold <ca...@apache.org> on 2004/12/20 18:11:42 UTC

Re: TimeZone and locale for PatternLayout Was: [RESULT][VOTE]

On Dec 20, 2004, at 8:33 AM, Ceki Gülcü wrote:
>
> CachedDateFormat is pretty interesting. However, I suspect that it may
> not work properly in case the data format passed by the user causes
> the length of the returned data to change over time. For example, if
> the format is 'yyyy-MMMMM-dd HH:mm:ss' and the cache is initialized on
> 4th of July, it will work properly for the remainder of the month but
> start returning erroneous results on August. Do you concur with this
> analysis?
>


I'm pretty sure that this is your quote, not mine (though I haven't 
gone through the list to check.  I think that the current 
implementation (in the last patch on the bug report) of 
CachedDateFormat safely handles length transitions, however it would be 
good to add unit tests to confirm this.


> As for the locale support in PatternLayout, it is probably
> overkill. It makes the code harder to understand and to maintain. I'd
> wait for someone using a different numbering system to contact us
> before adding doing it on our own.
>
>> Add TimeZone and locale for PatternLayout, remove obs DateFormat
>> http://issues.eu.apache.org/bugzilla/show_bug.cgi?id=32064
>>
>> This last one I need to review after the recent discussions on 
>> jakarta-commons.  It may be more appropriate to specify locale on the 
>> appender instead of the layout.
>>


There are a couple of issues bundled into this one bug report and it 
might be good to separate them and discuss and act on them 
individually.  The issues as I see them are:

The existing AbsoluteTimeDateFormat, ISO8601DateFormat, and 
DateTimeDateFormat contained buggy caching code and had been 
effectively abandoned since PatternLayout no longer created these 
classes, but created java.text.SimpleDateFormat objects.  The proposed 
resolution was to reimplement those classes as wrappers of 
SimpleDateFormat.

The flawed caching code in the unused DateFormat's, if properly 
implemented, could result in a noticable performance benefit.  A new 
class, CachedDateFormat, was written that could wrap any DateFormat.  
If the class is introduced, then PatternLayout should be modified to 
wrap the DateFormat that it constructs with CachedDateFormat.  If 
CachedDateFormat proved to be unreliable, then it would be trivial to 
remove by changing a line or so in PatternLayout.

CachedDateFormat attempted to support multiple digit sets.  However, I 
couldn't find any stock Java locales that used a digit set other than 
0-9 in its date formats.  I had expected that the Thai locale would use 
Thai digits, but I was wrong.

Date formatting was affected by the current locale and timezone of the 
thread and there was no mechanism to configure a timezone or locale to 
be used.  The existing patches added configurable timezones and locales 
to the pattern layout which would modify the behavior of the date 
formats.  Based on some of the previous discussions on the Jakarta 
Commons Dev list, I'd like to evaluate whether Appender is a better 
place for the locale to be specified.

What I'd like to do is:

Commit simplifications to the DateFormat's and add CachedDateFormat but 
simplified to only recognized arabic digit sets.

Review configurable locales and timezones and come back to the list 
with a specific recommendation.  My current take is that appender is 
probably a more appropriate place to specify locale.  However, that 
should be considered in a bigger scope where locale affects both the 
layout and rendering non-string messages.  TimeZone is likely still 
appropriate to configure on the layout.


---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org


Re: TimeZone and locale for PatternLayout Was: [RESULT][VOTE]

Posted by Curt Arnold <ca...@houston.rr.com>.
>> It could only happen if at some point in time, some field or field we 
>> shorted and one or more fields were shortened by the same amount. If 
>> a simple algorithm could be devised to detect such cases beforehand, 
>> then CachedDateFormat is a winner.
>
> I had mentioned "SS0" was a potentially malicious time format.  In the 
> code where PatternLayout creates the SimpleDateFormat, the format 
> string could be sanity checked.  If it appeared troublesome, then the 
> SimpleDateFormat would not be wrapped with CachedDateFormat.  If you 
> wanted to be very careful, you'd get skip caching anything containing 
> "G", "MMM", "E", "z" and things with only one or two "S".
>
> Another approach would be to staleness date the millisecond field so 
> that you don't trust it if it is, say, over a minute old.
>
>

Or probably better, redetermine the millisecond location if the length 
changed (the current criteria) or if the corresponding characters are 
not digits.  You'd still need a guard that you either had "SSS" or no 
"S", but I think the additional check would make it extremely hard to 
trick the cache into making an invalidate date.


---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org


Re: TimeZone and locale for PatternLayout Was: [RESULT][VOTE]

Posted by Curt Arnold <ca...@apache.org>.
On Dec 21, 2004, at 9:54 AM, Ceki Gülcü wrote:
>
>> CachedDateFormat would not be able to detect the milliseconds field 
>> on RelativeTimeDateFormat unless the starting time was an integral 
>> second and would not be able to detect millisecond fields if 
>> non-arabic digits were set.  In either of these cases, you would have 
>> an extra call per format evaluation.  I believe the original patch 
>> avoided caching RelativeTimeDateFormat.
>
> Making DatePatternConverter aware of CachedDateFormat would avoid 
> caching RelativeTimeDateFormat.
>

I believe the original integration with pattern layout did attempt to 
cache RelativeTimeDateFormat.  I was just saying that it would be 
technically possible to hand code that combination and if you did, it 
would just perform a little slower.

>> The worse-case scenario is if you could construct a date-time format 
>> where the location of the millisecond field changed, but the total 
>> length of the field did not.  I don't think that you could create one 
>> with SimpleDateFormat, however you could obviously write a custom 
>> DateFormat that did.
>
> It could only happen if at some point in time, some field or field we 
> shorted and one or more fields were shortened by the same amount. If a 
> simple algorithm could be devised to detect such cases beforehand, 
> then CachedDateFormat is a winner.

I had mentioned "SS0" was a potentially malicious time format.  In the 
code where PatternLayout creates the SimpleDateFormat, the format 
string could be sanity checked.  If it appeared troublesome, then the 
SimpleDateFormat would not be wrapped with CachedDateFormat.  If you 
wanted to be very careful, you'd get skip caching anything containing 
"G", "MMM", "E", "z" and things with only one or two "S".

Another approach would be to staleness date the millisecond field so 
that you don't trust it if it is, say, over a minute old.


>
>> There is an observable difference when running the performance tests 
>> to a null appender with CachedDateFormat.  However, it may not be 
>> significant in more realistic deployments.  It is a significant 
>> improvement over the flawed (and currently unused) caching code in 
>> the original DateFormats.  However, the original motivation for the 
>> caching may no longer be relevant and so a new CachedDateFormat may 
>> not have a performance benefit that justifies the added complexity.
>
> Mostly agreed. However, I still wonder whether with a little extra 
> work CachedDateFormat could not be polished to become a pearl.
>
>> The second pass used localized values of both 0 and 9 to identify the 
>> millisecond field.  If the default locale changed, CachedDateFormat 
>> would not switch locales until the next integral second.  There may 
>> be other issues that come to pass with any locale rework, so maybe 
>> the best approach is to leave CachedDateFormat out for now.  It will 
>> be available in Bugzilla in case someone ever wants to add it later.
>
> I'll give it a shot if you don't mind.


---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org


Re: TimeZone and locale for PatternLayout Was: [RESULT][VOTE]

Posted by Ceki Gülcü <ce...@qos.ch>.
Curt,

You are several steps ahead of me. I had seen but had not paid attention to:

  if (now < previousTime + 1000L && now >= previousTime) {
    ...
   else {
       ....
       //
       //   if the length changed then
       //      recalculate the millisecond position
       if (cache.length() != prevLength) {
         ....
         //    detect the start of the millisecond field
         millisecondStart = findMillisecondStart(previousTime,
                                                 tempBuffer.toString(),
                                                 formatter);
   }

The above lines nicely take care of the case where the position of the
millisecond field in the formatted output varies over time. The above
is both simple and efficient - brilliant stuff.

At 12:01 AM 12/21/2004, Curt Arnold wrote:

>For those tuning in late: The basic idea of the cached date format is that 
>if the time is within the same integral second as a previous request, then 
>only the milliseconds field needs to be rewritten.  To find the 
>milliseconds field, on the first request (or any request where the total 
>length of the formatted field has changed), two times only differing in 
>the number of milliseconds are output and the results are analyzed.  If 
>the milliseconds format is unrecognized, then the CachedDateFormat will 
>simply delegate to the underlying DateFormat.

I apologize for forcing you to explain this allover again and wasting your 
time.

>CachedDateFormat would not be able to detect the milliseconds field on 
>RelativeTimeDateFormat unless the starting time was an integral second and 
>would not be able to detect millisecond fields if non-arabic digits were 
>set.  In either of these cases, you would have an extra call per format 
>evaluation.  I believe the original patch avoided caching 
>RelativeTimeDateFormat.

Making DatePatternConverter aware of CachedDateFormat would avoid caching 
RelativeTimeDateFormat.

>The worse-case scenario is if you could construct a date-time format where 
>the location of the millisecond field changed, but the total length of the 
>field did not.  I don't think that you could create one with 
>SimpleDateFormat, however you could obviously write a custom DateFormat 
>that did.

It could only happen if at some point in time, some field or field we 
shorted and one or more fields were shortened by the same amount. If a 
simple algorithm could be devised to detect such cases beforehand, then 
CachedDateFormat is a winner.

>There is an observable difference when running the performance tests to a 
>null appender with CachedDateFormat.  However, it may not be significant 
>in more realistic deployments.  It is a significant improvement over the 
>flawed (and currently unused) caching code in the original 
>DateFormats.  However, the original motivation for the caching may no 
>longer be relevant and so a new CachedDateFormat may not have a 
>performance benefit that justifies the added complexity.

Mostly agreed. However, I still wonder whether with a little extra work 
CachedDateFormat could not be polished to become a pearl.

>The second pass used localized values of both 0 and 9 to identify the 
>millisecond field.  If the default locale changed, CachedDateFormat would 
>not switch locales until the next integral second.  There may be other 
>issues that come to pass with any locale rework, so maybe the best 
>approach is to leave CachedDateFormat out for now.  It will be available 
>in Bugzilla in case someone ever wants to add it later.

I'll give it a shot if you don't mind.


-- 
Ceki Gülcü

   The complete log4j manual: http://qos.ch/log4j/



---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org


Re: TimeZone and locale for PatternLayout Was: [RESULT][VOTE]

Posted by Ceki Gülcü <ce...@qos.ch>.
At 07:01 PM 12/21/2004, Curt Arnold wrote:

>Allowing TZ to be specified at the repository level would add an 
>interaction between layout and repository that I don't believe currently 
>exists and I don't see it adds much additional value.

Excellent point. Now note that as explained in [1], there are several
cases where it makes sense to let most, if not all, log4j components
know about the LoggerRepository they are attached to.

Use case 1: Loggers retrieve resource bundles from the LR. (Loggers
already know their LR.) BTW, this assumes that only a single location
is performed for all appenders. If localization is performed per
appender, then it makes sense for appender to retrieve locale specific
resource bundles from the LR, implying that appenders know about their
LR.

Use case 2: PatternLayout retrieves new conversion words from its
LR. (In 1.3, PatternLayout can learn new conversion words on the fly.)
Mrs. Piggy would set up new conversion words in a XML config
file. These new rules are placed within the LR and all PattternLayout
instances inherit those new words from the LR.

Use case 3: the %logger2 pattern converter will shorten package or
class names according to a mapping specified by the user. Again,
Mrs. Piggy would specify the mapping in a config file. This mapping
would be shared by all instances of %logger2.

Use case 4: Properties (key=value pairs) can be set at the LR
level. It kinda makes sense to share these values across components.

[1] http://marc.theaimsgroup.com/?l=log4j-dev&m=110357800507837&w=2

>I think that we should pick one place to add it and the gather feedback to 
>see if we picked the wrong place or need to add additional specification 
>points.


Expecting the end-user to understand all possibilities and return with
valuable suggestions does not always  work and in general takes a very
long  time. Actually,  expecting experts  to come  back  with valuable
input  does not  always work  either.  The  best approach  is thinking
about the problem as hard as  you can, covering as many aspects as you
can,  and  come  back  with  a  proposal,  or  even  better,  with  an
implementation.  However, you seem to know this better than I do.



-- 
Ceki Gülcü

   The complete log4j manual: http://qos.ch/log4j/



---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org


Re: TimeZone and locale for PatternLayout Was: [RESULT][VOTE]

Posted by Curt Arnold <ca...@apache.org>.
On Dec 21, 2004, at 11:41 AM, Mark R Durman/CA/US/MQSolutions wrote:

>
> Setting the Time Zone to UTC does enable correlation of events across 
> servers in different time zones (assuming they are all synchronized 
> with a common time source). WebSphere MQ stores all timestamps in UTC 
> for that reason. For a distributed application, this can be useful.
>
> If the user can set the TZ at the appender level, why not support a TZ 
> at the repository level as well? Most of the time it won't be used, 
> but if a user wants all appenders to use a non-default TZ they can set 
> it there.

Allowing TZ to be specified at the repository level would add an 
interaction between layout and repository that I don't believe 
currently exists and I don't see it adds much additional value.  I 
think that we should pick one place to add it and the gather feedback 
to see if we picked the wrong place or need to add additional 
specification points.

Do you have a preference between specifying timezone within the pattern 
layout (which would allow multiple time renderings in difference zones 
within one rendered message) or as an attribute of the layout.

>
> The distributed app scenario also supports the outputting of events 
> through different appenders in different languages. I worked on a 
> large airline reservations system in Germany a while back that had 
> support groups in Germany, France and Italy. I'm sure they would have 
> liked to see events in their own language. It also comes into play for 
> product support--a support group may want a customer to enable an 
> appender for troubleshooting and have the events output in the 
> language they understand, not the language the customer uses.
>

I don't see the use case as unreasonable and think that it should not 
be prematurely discarded.  It will take a bit of exploring to see how 
hard and expensive (or not) it would be in practice.


---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org


Re: TimeZone and locale for PatternLayout Was: [RESULT][VOTE]

Posted by Mark R Durman/CA/US/MQSolutions <md...@mqsolutions.com>.
Setting the Time Zone to UTC does enable correlation of events across 
servers in different time zones (assuming they are all synchronized with a 
common time source). WebSphere MQ stores all timestamps in UTC for that 
reason. For a distributed application, this can be useful.

If the user can set the TZ at the appender level, why not support a TZ at 
the repository level as well? Most of the time it won't be used, but if a 
user wants all appenders to use a non-default TZ they can set it there.

The distributed app scenario also supports the outputting of events 
through different appenders in different languages. I worked on a large 
airline reservations system in Germany a while back that had support 
groups in Germany, France and Italy. I'm sure they would have liked to see 
events in their own language. It also comes into play for product 
support--a support group may want a customer to enable an appender for 
troubleshooting and have the events output in the language they 
understand, not the language the customer uses.

Mark Durman




Curt Arnold <ca...@apache.org> 
12/21/2004 09:16 AM
Please respond to
"Log4J Developers List" <lo...@logging.apache.org>


To
Log4J Developers List <lo...@logging.apache.org>
cc

Subject
Re: TimeZone and locale for PatternLayout Was: [RESULT][VOTE]







On Dec 21, 2004, at 10:44 AM, Ceki Gülcü wrote:

> At 12:01 AM 12/21/2004, Curt Arnold wrote:
>
> The Object to String conversion is done once and only once. The result 
> is cached and subsequently shared by all appenders. While I can 
> imagine having two emails (thus two layouts) having different 
> timezones, I can't see the use case for outputting an event in German 
> through one appender, Dutch in another and in English in a third.
>

I think the multi-locale use case is reasonable, however it may be one 
that we reject as a requirement.  I'd like to see how it plays out 
before rejecting it.


>> Locale and timezone, like layout, are accommodations of the 
>> preferences of a particular audience being reached through the 
>> appender.  Can you think of reasons that you would want to specify 
>> them at a higher level?
>
> I have to admit that I actually have a hard time imagining any use for 
> setting the TimeZone because:
>
> 1) if events are to be viewed locally, they will be formatted using 
> the system default TimeZone which is usually the same as the desired 
> TimeZone.
>
> 2) if events need to be viewed remotely, they are transmitted in a 
> TZ-neutral form over the wire. The recipient can output the event in 
> its local TimeZone.
>
> The only remaining case is then the user wanting to output the event 
> in a TZ other than that of the her system (default) TZ. However, in 
> that case we could ask the user to specify a non default TZ for each 
> of her PatternLayouts that require it. No need to specify it at the 
> LoggerRepository level.

I would think the most common use for an arbitrary timezone would be to 
specify that the time should be rendered as UTC.  I do think that is 
still at the layout level (whether or not embedded in the pattern 
specification)

>
>> Implementing appender level locale rendering would likely involve 
>> creating threads to do rendering on non-default locales in some 
>> instances and would likely have some performance hit, but shouldn't 
>> significantly performance when not specified.   However, it is going 
>> to take some experimentation to see where it can be effectively 
>> performed.
>
> The conjunction of the words "level" and "locale" made me think of the 
> case where the level string was output in the user's locale. So, 
> English speaking users would see TRACE, DEBUG, INFO, WARN, ERROR, 
> EMERG a French speaker would see TRACE, BOGUE, INFO, AVERTISSEMENT, 
> ERREUR, URGENCE a Turkish speaker would see IZ, DEBUG, BILGI, DIKKAT, 
> YANLIS, IMDAT.
>

I think working the locale issue through is going to take some 
experimentation.  It would probably be best to take a shot at a 
regexp-based message localizing layout.  Once we have that piece, then 
we can experiment on the localization of level names and 
ObjectRendering.  However, I've going to have to say that I can't do 
that until after log4cxx 0.9.8 snapshot is stable.


---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org



Re: TimeZone and locale for PatternLayout Was: [RESULT][VOTE]

Posted by Curt Arnold <ca...@apache.org>.
On Dec 21, 2004, at 10:44 AM, Ceki Gülcü wrote:

> At 12:01 AM 12/21/2004, Curt Arnold wrote:
>
> The Object to String conversion is done once and only once. The result 
> is cached and subsequently shared by all appenders. While I can 
> imagine having two emails (thus two layouts) having different 
> timezones, I can't see the use case for outputting an event in German 
> through one appender, Dutch in another and in English in a third.
>

I think the multi-locale use case is reasonable, however it may be one 
that we reject as a requirement.  I'd like to see how it plays out 
before rejecting it.


>> Locale and timezone, like layout, are accommodations of the 
>> preferences of a particular audience being reached through the 
>> appender.  Can you think of reasons that you would want to specify 
>> them at a higher level?
>
> I have to admit that I actually have a hard time imagining any use for 
> setting the TimeZone because:
>
> 1) if events are to be viewed locally, they will be formatted using 
> the system default TimeZone which is usually the same as the desired 
> TimeZone.
>
> 2) if events need to be viewed remotely, they are transmitted in a 
> TZ-neutral form over the wire. The recipient can output the event in 
> its local TimeZone.
>
> The only remaining case is then the user wanting to output the event 
> in a TZ other than that of the her system (default) TZ. However, in 
> that case we could ask the user to specify a non default TZ for each 
> of her PatternLayouts that require it. No need to specify it at the 
> LoggerRepository level.

I would think the most common use for an arbitrary timezone would be to 
specify that the time should be rendered as UTC.  I do think that is 
still at the layout level (whether or not embedded in the pattern 
specification)

>
>> Implementing appender level locale rendering would likely involve 
>> creating threads to do rendering on non-default locales in some 
>> instances and would likely have some performance hit, but shouldn't 
>> significantly performance when not specified.   However, it is going 
>> to take some experimentation to see where it can be effectively 
>> performed.
>
> The conjunction of the words "level" and "locale" made me think of the 
> case where the level string was output in the user's locale. So, 
> English speaking users would see TRACE, DEBUG, INFO, WARN, ERROR, 
> EMERG a French speaker would see TRACE, BOGUE, INFO, AVERTISSEMENT, 
> ERREUR, URGENCE a Turkish speaker would see IZ, DEBUG, BILGI, DIKKAT, 
> YANLIS, IMDAT.
>

I think working the locale issue through is going to take some 
experimentation.  It would probably be best to take a shot at a 
regexp-based message localizing layout.  Once we have that piece, then 
we can experiment on the localization of level names and 
ObjectRendering.  However, I've going to have to say that I can't do 
that until after log4cxx 0.9.8 snapshot is stable.


---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org


Re: TimeZone and locale for PatternLayout Was: [RESULT][VOTE]

Posted by Ceki Gülcü <ce...@qos.ch>.
At 12:01 AM 12/21/2004, Curt Arnold wrote:

>I'm going to have do some research before I can make a reasonable proposal.
>
>Here is a use case that I think suggests that Layout or Appender is the 
>right level: Send logging events to Ceki in fr-CH localized email messages 
>with time in Central European Timezone and to Curt in en-US email messages 
>with time in US Central time zone.
>
>However, if you were using a SocketAppender instead and receiving in 
>Chainsaw, there would not be a layout involved, however you would want to 
>be able to control the locale used in the Object.toString() call used to 
>render non-string messages.  Timezone would not come into play until a 
>layout was involved.

The Object to String conversion is done once and only once. The result is 
cached and subsequently shared by all appenders. While I can imagine having 
two emails (thus two layouts) having different timezones, I can't see the 
use case for outputting an event in German through one appender, Dutch in 
another and in English in a third.

>Locale and timezone, like layout, are accommodations of the preferences of 
>a particular audience being reached through the appender.  Can you think 
>of reasons that you would want to specify them at a higher level?

I have to admit that I actually have a hard time imagining any use for 
setting the TimeZone because:

1) if events are to be viewed locally, they will be formatted using the 
system default TimeZone which is usually the same as the desired TimeZone.

2) if events need to be viewed remotely, they are transmitted in a 
TZ-neutral form over the wire. The recipient can output the event in its 
local TimeZone.

The only remaining case is then the user wanting to output the event in a 
TZ other than that of the her system (default) TZ. However, in that case we 
could ask the user to specify a non default TZ for each of her 
PatternLayouts that require it. No need to specify it at the 
LoggerRepository level.

>Implementing appender level locale rendering would likely involve creating 
>threads to do rendering on non-default locales in some instances and would 
>likely have some performance hit, but shouldn't significantly performance 
>when not specified.   However, it is going to take some experimentation to 
>see where it can be effectively performed.

The conjunction of the words "level" and "locale" made me think of the case 
where the level string was output in the user's locale. So, English 
speaking users would see TRACE, DEBUG, INFO, WARN, ERROR, EMERG a French 
speaker would see TRACE, BOGUE, INFO, AVERTISSEMENT, ERREUR, URGENCE a 
Turkish speaker would see IZ, DEBUG, BILGI, DIKKAT, YANLIS, IMDAT.

Many developers would want to see the level strings translated to their own 
language while many others would prefer the English terms. However, their 
preference would probably be the same for all appenders. So, setting the 
Locale at the LoggerRepository level makes sense. The user could later 
override the output language of the level strings (i.e. the locale) for a 
specific layout instance (most probably within the %level pattern word).

If the Locale could be set at the LoggerRepository level, in order to 
obtain localized level strings, the user could simply write:

<configuration xmlns="http://...>

   <!-- new action: -->
   <localized-level-strings/>

   <!-- the rest remains same as before -->
   <appender name="A1">  ... <appender>
   <appender name="A2">  ...  <appender>
   <root> <appender-ref ref="A1"> .... </root>
<configration>

Thanks for bearing with me this far.


-- 
Ceki Gülcü

   The complete log4j manual: http://qos.ch/log4j/



---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org


Re: TimeZone and locale for PatternLayout Was: [RESULT][VOTE]

Posted by Curt Arnold <ca...@apache.org>.
On Dec 20, 2004, at 3:26 PM, Ceki Gülcü wrote:
> As invoked earlier, I think CachedDateFormat may fail for certain
> patterns at certain dates. If we can recognize the limited number of
> formats for which it fails (if it does) and sidestep those, then
> fine. Before going any further, do you agree that patterns causing
> CachedDateFormat to fail exist and that it's just not me making things
> up?
>

For those tuning in late: The basic idea of the cached date format is 
that if the time is within the same integral second as a previous 
request, then only the milliseconds field needs to be rewritten.  To 
find the milliseconds field, on the first request (or any request where 
the total length of the formatted field has changed), two times only 
differing in the number of milliseconds are output and the results are 
analyzed.  If the milliseconds format is unrecognized, then the 
CachedDateFormat will simply delegate to the underlying DateFormat.

CachedDateFormat would not be able to detect the milliseconds field on 
RelativeTimeDateFormat unless the starting time was an integral second 
and would not be able to detect millisecond fields if non-arabic digits 
were set.  In either of these cases, you would have an extra call per 
format evaluation.  I believe the original patch avoided caching 
RelativeTimeDateFormat.

The worse-case scenario is if you could construct a date-time format 
where the location of the millisecond field changed, but the total 
length of the field did not.  I don't think that you could create one 
with SimpleDateFormat, however you could obviously write a custom 
DateFormat that did.

There is an observable difference when running the performance tests to 
a null appender with CachedDateFormat.  However, it may not be 
significant in more realistic deployments.  It is a significant 
improvement over the flawed (and currently unused) caching code in the 
original DateFormats.  However, the original motivation for the caching 
may no longer be relevant and so a new CachedDateFormat may not have a 
performance benefit that justifies the added complexity.


>
>> CachedDateFormat attempted to support multiple digit sets.  However, 
>> I couldn't find any stock Java locales that used a digit set other 
>> than 0-9 in its date formats.  I had expected that the Thai locale 
>> would use Thai digits, but I was wrong.
>
> If I am not mistaken, the existing code in CachedDateFormat only
> localized the digit 0. Which may be enough in case the
> SimpleDateFormat intance and CachedDateFormat instance use the same
> Localization but if not, then the output will be inconsistent.

The second pass used localized values of both 0 and 9 to identify the 
millisecond field.  If the default locale changed, CachedDateFormat 
would not switch locales until the next integral second.  There may be 
other issues that come to pass with any locale rework, so maybe the 
best approach is to leave CachedDateFormat out for now.  It will be 
available in Bugzilla in case someone ever wants to add it later.


>
>> Date formatting was affected by the current locale and timezone of 
>> the thread and there was no mechanism to configure a timezone or 
>> locale to be used.  The existing patches added configurable timezones 
>> and locales to the pattern layout which would modify the behavior of 
>> the date formats.  Based on some of the previous discussions on the 
>> Jakarta Commons Dev list, I'd like to evaluate whether Appender is a 
>> better place for the locale to be specified.
>>
>> What I'd like to do is:
>>
>> Commit simplifications to the DateFormat's and add CachedDateFormat 
>> but simplified to only recognized arabic digit sets.
>
> That would be good.
>
>> Review configurable locales and timezones and come back to the list 
>> with a specific recommendation.  My current take is that appender is 
>> probably a more appropriate place to specify locale.  However, that 
>> should be considered in a bigger scope where locale affects both the 
>> layout and rendering non-string messages.  TimeZone is likely still 
>> appropriate to configure on the layout.
>
> This raises a much wider question. Should a given customization be
> allowed at the logger repository level, logger level, appender level,
> at the layout level or at the pattern converter level? Getting the
> answer right provides tremendous added value. For example, the named
> logger hierarchy propagates 'level' values according the level
> inheritance rule. This in turn provides a very fast, yet meaningful
> filtering mechanism for categorizing logging statements. The fact that
> we got this question right is one of the main reasons behind log4j's
> success. Appender additivity is another example showing that getting
> the collaboration rules between components correctly makes a big
> difference.
>
> I happen to think that the logger repository should/can be viewed as
> the central point influencing all the components attached to it. For
> example,
>
> 1) properties of the logger repository should/can be visible at all
> components levels.
>
> 2) new pattern conversion rules defined at the logger repository level
> should/can be shared by all the instances of PatternLayout attached to
> that logger repository.
>
> 3) a resource bundles attached to a logger repository should/can be
> shared by all *loggers* (hint hint), appenders and layouts.
>
> 4) The mapping URL (defined below) attached to a a logger repository
> should/can shared by all instances of %logger2 pattern converter.
>
> In "should/can", the "should" part signifies my current inclination to
> think of the above as good design. The "can" part means that design is
> still open for debate.
>
> What is the mapping URL?
> ------------------------
>
> We routinely write o.a.l.r.RollingFileAppender instead of
> org.apache.log4j.rolling.RollingFileAppender. The first form is almost
> as precise and much shorter. Whenever I get the chance, I'd like to
> implement a pattern converter named %logger2 which instead of printing
> org.apache.log4j.rolling.RollingFileAppender will print
> o.a.l.r.RollingFileAppender. The shortened forms will be defined in a
> properties file defined by the user. (We will provide a default 
> mapping.)
> The location of this mapping will be specified with a URL hence the
> term "mapping URL".
>
> Coming back to the TimeZone question, we could imagine that a TimeZone
> could be set at the LoggerRepository level. This TImeZone would
> percolate down to all levels below. However, if needed it could be
> overridden at a lower level, e.g. the pattern converter level. Can the
> TimeZone influence multiple pattern converters of a PatternLayout? If
> that's not a plausible scenario, then it does not make sense to define
> a TimeZone at the Appender level nor at the PatternLayout level.
>
> Providing too many or meaningless extension/customization points will
> confuse the user, make thins harder to manage for her, and makes the
> code harder to maintain for us. Getting the collaborations rules right
> makes all the difference in the world.
>


I'm going to have do some research before I can make a reasonable 
proposal.

Here is a use case that I think suggests that Layout or Appender is the 
right level: Send logging events to Ceki in fr-CH localized email 
messages with time in Central European Timezone and to Curt in en-US 
email messages with time in US Central time zone.

However, if you were using a SocketAppender instead and receiving in 
Chainsaw, there would not be a layout involved, however you would want 
to be able to control the locale used in the Object.toString() call 
used to render non-string messages.  Timezone would not come into play 
until a layout was involved.

You could either specify TimeZone as  a property of the Layout, in 
which case all time formats (likely one, but possibly more) within a 
message would be in a single time zone, or you could extend the pattern 
syntax for dates to to include a timezone specifier.  The second would 
allow you to represent the time, for example, in both GMT and a local 
time within the same formatted message.  I chose the first since I'm a 
wimp and it was easier.

Locale and timezone, like layout, are accommodations of the preferences 
of a particular audience being reached through the appender.  Can you 
think of reasons that you would want to specify them at a higher level?

Implementing appender level locale rendering would likely involve 
creating threads to do rendering on non-default locales in some 
instances and would likely have some performance hit, but shouldn't 
significantly performance when not specified.   However, it is going to 
take some experimentation to see where it can be effectively performed.



---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org


Re: TimeZone and locale for PatternLayout Was: [RESULT][VOTE]

Posted by Curt Arnold <ca...@apache.org>.
>
> At this moment, I'm too tired to try to fully understand why it fails 
> and how it could be fixed. More tomorrow.
>
>

The underlying code did not anticipate the use of only two 'SS' which I 
assume that milliseconds 0 to 99 are represented with two digits and 
100-999 with three.  I have attached another patch file to the Bug with 
the fix and the your test added to the unit test.  Basically if it 
doesn't see "000" and "987", it will just delegate to the inner date 
format.  Previously, it only checked for a "0" and "9".   You could 
probably still mess up the caching by specifying a "SS0" format.

I'm fine with dropping CachedDateFormat.  I wrote the original 
iteration before I realized that the buggy caching code was no longer 
used.

Specifying the Locale should probably be held off to be done in 
conjunction with the localizing layout that I had discussed on on the 
commons-dev mailing list.

Probably should address TimeZone in the near future.  Specifying it on 
the layout is simpler and keeps the content between the curly braces 
consistent with JDK's SimpleDateFormat.   However, it doesn't allow you 
to use multiple time zones in one log message.

How about:

'   uses an optional {tz=} following %d to specify time zone
%d{tz=GMT}{yyyy-MM-dd HH:mm:ss,SSS} Z : %d{yyyy-MM-dd HH:mm:ss,SSS z} - 
%m




---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org


Re: TimeZone and locale for PatternLayout Was: [RESULT][VOTE]

Posted by Ceki Gülcü <ce...@qos.ch>.
Not necessarily the most convincing use case, but the following fails,

import org.apache.log4j.*;
import org.apache.log4j.helpers.*;
import java.text.*;
import java.util.*;

public class CDF {

   protected static FieldPosition pos = new FieldPosition(0);

   public static void main(String[] args) throws Exception {

     SimpleDateFormat sdf1 = new SimpleDateFormat("yyyy-MMMM-dd HH:mm:ss,SS 
Z");
     CachedDateFormat cdf1 = new CachedDateFormat(sdf1);
     StringBuffer buf = new StringBuffer();

     Calendar c = Calendar.getInstance();
     c.set(2004, Calendar.DECEMBER, 12, 20, 0);
     c.set(Calendar.SECOND, 37);
     c.set(Calendar.MILLISECOND, 23);

     cdf1.format(c.getTime(), buf, pos);
     System.out.println(buf.toString());
     buf.setLength(0);

     cdf1.format(c.getTime(), buf, pos);
     System.out.println(buf.toString());
     buf.setLength(0);

     c.set(2005, Calendar.JANUARY, 1, 0, 0);
     c.set(Calendar.SECOND, 13);
     c.set(Calendar.MILLISECOND, 905);

     cdf1.format(c.getTime(), buf, pos);
     System.out.println(buf.toString());
     buf.setLength(0);

     cdf1.format(c.getTime(), buf, pos);
     System.out.println(buf.toString());

   }
}

It incorrectly outputs

2004-December-12 20:00:37,23 +0100
2004-December-12 20:00:37,023+0100
2005-January-01 00:00:13,905 +0100
2005-January-01 00:00:13,9905+0100

At this moment, I'm too tired to try to fully understand why it fails and 
how it could be fixed. More tomorrow.


-- 
Ceki Gülcü

   The complete log4j manual: http://qos.ch/log4j/



---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org


Re: TimeZone and locale for PatternLayout Was: [RESULT][VOTE]

Posted by Ceki Gülcü <ce...@qos.ch>.
Curt,

At 10:26 PM 12/20/2004, Ceki Gülcü wrote:

>As invoked earlier, I think CachedDateFormat may fail for certain
>patterns at certain dates. If we can recognize the limited number of
>formats for which it fails (if it does) and sidestep those, then
>fine. Before going any further, do you agree that patterns causing
>CachedDateFormat to fail exist and that it's just not me making things
>up?

I had predicted the following would produce incorrect results.

import org.apache.log4j.*;
import org.apache.log4j.helpers.*;
import java.text.*;
import java.util.*;

public class CDF {

   protected static FieldPosition pos = new FieldPosition(0);

   public static void main(String[] args) throws Exception {

     SimpleDateFormat sdf1 = new SimpleDateFormat("yyyy-MMMM-dd HH:mm:ss,SSS");
     CachedDateFormat cdf1 = new CachedDateFormat(sdf1);
     StringBuffer buf = new StringBuffer();

     Calendar c = Calendar.getInstance();
     c.set(2004, Calendar.DECEMBER, 12, 20, 0);

     cdf1.format(c.getTime(), buf, pos);
     System.out.println(buf.toString());
     buf.setLength(0);

     cdf1.format(c.getTime(), buf, pos);
     System.out.println(buf.toString());
     buf.setLength(0);

     c.set(2005, Calendar.JANUARY, 1, 0, 0);

     cdf1.format(c.getTime(), buf, pos);
     System.out.println(buf.toString());
     buf.setLength(0);

     cdf1.format(c.getTime(), buf, pos);
     System.out.println(buf.toString());

   }
}

Instead, it produces

2004-December-12 20:00:11,200
2004-December-12 20:00:11,200
2005-January-01 00:00:11,200
2005-January-01 00:00:11,200

which is correct. So much for my predictions.


-- 
Ceki Gülcü

   The complete log4j manual: http://qos.ch/log4j/



---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org


Re: TimeZone and locale for PatternLayout Was: [RESULT][VOTE]

Posted by Ceki Gülcü <ce...@qos.ch>.
At 06:11 PM 12/20/2004, Curt Arnold wrote:

>The existing AbsoluteTimeDateFormat, ISO8601DateFormat, and 
>DateTimeDateFormat contained buggy caching code and had been effectively 
>abandoned since PatternLayout no longer created these classes, but created 
>java.text.SimpleDateFormat objects.  The proposed resolution was to 
>reimplement those classes as wrappers of SimpleDateFormat.

yes, and imho it's quite a bright proposal too.

>The flawed caching code in the unused DateFormat's, if properly 
>implemented, could result in a noticable performance benefit.  A new 
>class, CachedDateFormat, was written that could wrap any DateFormat.
>If the class is introduced, then PatternLayout should be modified to wrap 
>the DateFormat that it constructs with CachedDateFormat.  If 
>CachedDateFormat proved to be unreliable, then it would be trivial to 
>remove by changing a line or so in PatternLayout.


As invoked earlier, I think CachedDateFormat may fail for certain
patterns at certain dates. If we can recognize the limited number of
formats for which it fails (if it does) and sidestep those, then
fine. Before going any further, do you agree that patterns causing
CachedDateFormat to fail exist and that it's just not me making things
up?


>CachedDateFormat attempted to support multiple digit sets.  However, I 
>couldn't find any stock Java locales that used a digit set other than 0-9 
>in its date formats.  I had expected that the Thai locale would use Thai 
>digits, but I was wrong.

If I am not mistaken, the existing code in CachedDateFormat only
localized the digit 0. Which may be enough in case the
SimpleDateFormat intance and CachedDateFormat instance use the same
Localization but if not, then the output will be inconsistent.

>Date formatting was affected by the current locale and timezone of the 
>thread and there was no mechanism to configure a timezone or locale to be 
>used.  The existing patches added configurable timezones and locales to 
>the pattern layout which would modify the behavior of the date 
>formats.  Based on some of the previous discussions on the Jakarta Commons 
>Dev list, I'd like to evaluate whether Appender is a better place for the 
>locale to be specified.
>
>What I'd like to do is:
>
>Commit simplifications to the DateFormat's and add CachedDateFormat but 
>simplified to only recognized arabic digit sets.

That would be good.

>Review configurable locales and timezones and come back to the list with a 
>specific recommendation.  My current take is that appender is probably a 
>more appropriate place to specify locale.  However, that should be 
>considered in a bigger scope where locale affects both the layout and 
>rendering non-string messages.  TimeZone is likely still appropriate to 
>configure on the layout.

This raises a much wider question. Should a given customization be
allowed at the logger repository level, logger level, appender level,
at the layout level or at the pattern converter level? Getting the
answer right provides tremendous added value. For example, the named
logger hierarchy propagates 'level' values according the level
inheritance rule. This in turn provides a very fast, yet meaningful
filtering mechanism for categorizing logging statements. The fact that
we got this question right is one of the main reasons behind log4j's
success. Appender additivity is another example showing that getting
the collaboration rules between components correctly makes a big
difference.

I happen to think that the logger repository should/can be viewed as
the central point influencing all the components attached to it. For
example,

1) properties of the logger repository should/can be visible at all
components levels.

2) new pattern conversion rules defined at the logger repository level
should/can be shared by all the instances of PatternLayout attached to
that logger repository.

3) a resource bundles attached to a logger repository should/can be
shared by all *loggers* (hint hint), appenders and layouts.

4) The mapping URL (defined below) attached to a a logger repository
should/can shared by all instances of %logger2 pattern converter.

In "should/can", the "should" part signifies my current inclination to
think of the above as good design. The "can" part means that design is
still open for debate.

What is the mapping URL?
------------------------

We routinely write o.a.l.r.RollingFileAppender instead of
org.apache.log4j.rolling.RollingFileAppender. The first form is almost
as precise and much shorter. Whenever I get the chance, I'd like to
implement a pattern converter named %logger2 which instead of printing
org.apache.log4j.rolling.RollingFileAppender will print
o.a.l.r.RollingFileAppender. The shortened forms will be defined in a
properties file defined by the user. (We will provide a default mapping.)
The location of this mapping will be specified with a URL hence the
term "mapping URL".

Coming back to the TimeZone question, we could imagine that a TimeZone
could be set at the LoggerRepository level. This TImeZone would
percolate down to all levels below. However, if needed it could be
overridden at a lower level, e.g. the pattern converter level. Can the
TimeZone influence multiple pattern converters of a PatternLayout? If
that's not a plausible scenario, then it does not make sense to define
a TimeZone at the Appender level nor at the PatternLayout level.

Providing too many or meaningless extension/customization points will
confuse the user, make thins harder to manage for her, and makes the
code harder to maintain for us. Getting the collaborations rules right
makes all the difference in the world.



-- 
Ceki Gülcü

   The complete log4j manual: http://qos.ch/log4j/



---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org