You are viewing a plain text version of this content. The canonical link for it is here.

Posted to log4j-dev@logging.apache.org by Curt Arnold <ca...@apache.org> on 2004/12/27 22:45:43 UTC

LoggingEvent.sequenceCount

I'm a bit concerned about LoggingEvent.sequenceNumber and 
sequenceCount.  It appears from a code review that unless externally 
synchronized, it would be possible to end up with multiple logging 
events with the same sequence number since the value of sequenceCount 
may be slightly out of sync on different threads.

Before diving into addressing any issues around that, I'd like to know 
more of the motivation for adding sequence number and implementing it 
on LoggingEvent.   I did a simple search on the mailing list and didn't 
find any messages that appeared to be on this topic.

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org

Re: LoggingEvent.sequenceCount

Posted by Endre Stølsvik <En...@Stolsvik.com>.

On Tue, 28 Dec 2004, Ceki Gülcü wrote:

| Anyway, a non-trivial amount of thinking went into the implementation of
| the equals() method. In the absence of a malevolent adversary, it is also
| correct although only probabilistically so. Its only downside is the
| addition of the static sequenceCount field and the sequenceNumber field in
| LoggingEvent.

How often is this number increased?

LoggingEvent sounds like a thing that is made for every log-invocation
that goes through (that is, that the log-call's level is above or equal to
the category's level, and these evaluations).

If that is so, then having a synchronized method -on the class- sounds
like a noticable synchronization impact, if you log much. A synchronized
block is in itself somewhat expensive, but in a multi threaded application
(e.g. Servlet Container), if two synchs happen to "hit" (e.g. on a
multi-CPU, or HT, or even on a normal preemptive multitasking OS), then
the contention time might become, as I've understood it, "very" high. Thus
it is interesting to avoid global synch-bottelnecks.

Isn't there some synchronization going on anyway when a log-invocation
goes through, on the Category and Appender levels?  Couldn't this
LoggingEvent sequenced number be made at the same time, avoiding the
additional synch block? Maybe one could use some Category Id (if there is
such a thing, if not, maybe make it - this won't be an impact), and add a
category-local sequenced number (one of them multiplied by some 10^a
constant, to give two separate spaces). This would reduce the global
contention, at least it wouldn't introduce any more synchs.

Java 5.0 have synchronization contention monitoring accessible from within
the application - maybe some unit tests could be devised?
java.lang.management.ThreadMXBean.


PS: I agree that the name should be changed if it really shouldn't reflect
the sequence of the LoggingEvent, but merely is used as an equals-device.
Maybe call it logginEventUniqueId or something like this..

PPS: How much synchronization have gone into log4j? Ideally, I guess, it
should -only- synchronize on Logger.getLogger() and when actually doing
the appending?

Endre


---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org

Re: LoggingEvent.sequenceCount

Posted by Ceki Gülcü <ce...@qos.ch>.

Anyway, a non-trivial amount of thinking went into the implementation of 
the equals() method. In the absence of a malevolent adversary, it is also 
correct although only probabilistically so. Its only downside is the 
addition of the static sequenceCount field and the sequenceNumber field in 
LoggingEvent.

Besides helping in the efficient implementation of the equals() method 
(needed by Chainsaw), the sequenceNumber field in LoggingEvent also greatly 
simplifies the implementation of receivers needing to repeatedly query a 
data store, e.g. DBReceiver.

So imho sequenceCount provides real added value.

At 10:11 PM 12/28/2004, you wrote:
>At 09:49 PM 12/28/2004, Curt Arnold wrote:
>>The point that I was trying to make and which I still think is valid 
>>despite my mistaken analysis of the hashCode function, is that it is 
>>better to have Chainsaw to make whatever comparison it thinks is 
>>appropriate for its filtering needs and avoid having to worry about 
>>maintaining the equals contract and having a hashCode function that 
>>results in good performance if LoggingEvents are added to a HashTable.
>
>--
>Ceki Gülcü
>
>   The complete log4j manual: http://www.qos.ch/log4j/

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org

Re: LoggingEvent.sequenceCount

Posted by Ceki Gülcü <ce...@qos.ch>.

At 09:49 PM 12/28/2004, Curt Arnold wrote:
>The point that I was trying to make and which I still think is valid 
>despite my mistaken analysis of the hashCode function, is that it is 
>better to have Chainsaw to make whatever comparison it thinks is 
>appropriate for its filtering needs and avoid having to worry about 
>maintaining the equals contract and having a hashCode function that 
>results in good performance if LoggingEvents are added to a HashTable.

Are you worried about having to implement LoggingEvent.equals() and 
hashCode() in C++?


-- 
Ceki Gülcü

   The complete log4j manual: http://www.qos.ch/log4j/



---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org

Re: LoggingEvent.sequenceCount

Posted by Curt Arnold <ca...@apache.org>.

On Dec 28, 2004, at 2:06 PM, Ceki Gülcü wrote:
>>
>
> The code currently says:
>
>   /**
>    * The hashcode is computed as XOR of the lower 32 bits of 
> sequenceNumber and
>    * the higher 32 bits of timeStamp;
>    */
>   public int hashCode() {
>     return (int)((timeStamp >> 32) ^ (sequenceNumber & 0xFFFFFFFF));
>   }
>
> In what way does the comment not match the code?

Sorry, I misread the comment.  Thought it said lower 32 bits of the 
timestamp.


>
> The hashcode does not need to uniformly spread out the returned values 
> over the range of integer. It just has to make a minimal effort to 
> avoid that many values do not pile up on top of each other. Given that 
> all hash tables algorithms compute the hash value over modulo 'n' 
> where 'n' is a small prime number, for all practical purposes the 
> following hashcode would have been amply sufficient.
>
>  public int hashCode() {
>     return (int)((sequenceNumber & 0xFFFFFFFF));
>   }
>
> We are doing a little better by XORing with the higher 32 bits of the 
> timestamp. (2 to 32 milliseconds is about 50 days.) A slightly better 
> implementation would have been
>
>
>   public int hashCode() {
>     return (int)((timeStamp >> 20) ^ (sequenceNumber & 0xFFFFFFFF));
>   }
>
> 2^20 millis = 1048576 millis or about 17 minutes.

The point that I was trying to make and which I still think is valid 
despite my mistaken analysis of the hashCode function, is that it is 
better to have Chainsaw to make whatever comparison it thinks is 
appropriate for its filtering needs and avoid having to worry about 
maintaining the equals contract and having a hashCode function that 
results in good performance if LoggingEvents are added to a HashTable.


>
>
>> p.s. LoggingEvent has commented out methods xgetMDCCopy and 
>> XXgetMDCKeySet.  Are these going to be deleted or uncommented?
>
> They'll eventually be deleted.


---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org

Re: LoggingEvent.sequenceCount

Posted by Ceki Gülcü <ce...@qos.ch>.

At 08:23 PM 12/28/2004, Curt Arnold wrote:

>On Dec 28, 2004, at 7:50 AM, Ceki Gülcü wrote:
>The sequenceNumber+timestamp combination makes it easy to implement
>>>>>
>>>>>equals() method for LoggingEvent.
>>>
>>>
>>>The implementation might be easy, but appears problematic.  The default
>>>implementation of equals and getHashCode would be preferred if all you
>>>cared about was object identity.
>>
>>Object identity is not enough. Chainsaw as well as other receivers may
>>access a source of LoggingEvent multiple times. In that case, we need
>>an efficient way to discard duplicate LoggingEvents.
>
>It may be preferable to move the definition of "close enough for chainsaw" 
>or "close enough for queries" out into the consuming classes.  I'll have 
>to dig into this later since I'm on the road and not really set up for 
>this type of code analysis.
>
>If log4j 1.2.8 logging events are deserialized, it looks like they may all 
>have the same sequence number so chainsaw might want to additional
>checks if the timestamps are the same and the sequenceNumber is the 
>default value.
>
>By overriding equals, you also are required to override hashCode (which 
>you do) which ideally distributes the hash values uniformly over the value 
>space of int.  By my reading, it looks like the hash code is an XOR of the 
>high 32 bits of the timestamp (despite the comment) and the sequence 
>number.  If I'm reading it right, that looks like all logging events with 
>a given sequenceNumber within a fairly large timeframe (4 million seconds, 
>11 days or so) would have the same hash code.  Even it it is fixed to use 
>the lower 32-bits, the hash codes still would be fairly tightly 
>clustered.  We could write a better hashCode, I'm just pointing out one of 
>the side effects of choosing to override equals.


The code currently says:

   /**
    * The hashcode is computed as XOR of the lower 32 bits of 
sequenceNumber and
    * the higher 32 bits of timeStamp;
    */
   public int hashCode() {
     return (int)((timeStamp >> 32) ^ (sequenceNumber & 0xFFFFFFFF));
   }

In what way does the comment not match the code?

The hashcode does not need to uniformly spread out the returned values over 
the range of integer. It just has to make a minimal effort to avoid that 
many values do not pile up on top of each other. Given that all hash tables 
algorithms compute the hash value over modulo 'n' where 'n' is a small 
prime number, for all practical purposes the following hashcode would have 
been amply sufficient.

  public int hashCode() {
     return (int)((sequenceNumber & 0xFFFFFFFF));
   }

We are doing a little better by XORing with the higher 32 bits of the 
timestamp. (2 to 32 milliseconds is about 50 days.) A slightly better 
implementation would have been


   public int hashCode() {
     return (int)((timeStamp >> 20) ^ (sequenceNumber & 0xFFFFFFFF));
   }

2^20 millis = 1048576 millis or about 17 minutes.


>p.s. LoggingEvent has commented out methods xgetMDCCopy and 
>XXgetMDCKeySet.  Are these going to be deleted or uncommented?

They'll eventually be deleted.


-- 
Ceki Gülcü

   The complete log4j manual: http://www.qos.ch/log4j/



---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org

Re: LoggingEvent.sequenceCount

Posted by Curt Arnold <ca...@apache.org>.

On Dec 28, 2004, at 7:50 AM, Ceki Gülcü wrote:
The sequenceNumber+timestamp combination makes it easy to implement
>>>>
>>>> equals() method for LoggingEvent.
>>
>>
>> The implementation might be easy, but appears problematic.  The 
>> default
>> implementation of equals and getHashCode would be preferred if all you
>> cared about was object identity.
>
> Object identity is not enough. Chainsaw as well as other receivers may
> access a source of LoggingEvent multiple times. In that case, we need
> an efficient way to discard duplicate LoggingEvents.

It may be preferable to move the definition of "close enough for 
chainsaw" or "close enough for queries" out into the consuming classes. 
  I'll have to dig into this later since I'm on the road and not really 
set up for this type of code analysis.

If log4j 1.2.8 logging events are deserialized, it looks like they may 
all have the same sequence number so chainsaw might want to additional  
checks if the timestamps are the same and the sequenceNumber is the 
default value.

By overriding equals, you also are required to override hashCode (which 
you do) which ideally distributes the hash values uniformly over the 
value space of int.  By my reading, it looks like the hash code is an 
XOR of the high 32 bits of the timestamp (despite the comment) and the 
sequence number.  If I'm reading it right, that looks like all logging 
events with a given sequenceNumber within a fairly large timeframe (4 
million seconds, 11 days or so) would have the same hash code.  Even it 
it is fixed to use the lower 32-bits, the hash codes still would be 
fairly tightly clustered.  We could write a better hashCode, I'm just 
pointing out one of the side effects of choosing to override equals.

All in all, I think overriding equals just gives us headaches we could 
avoid by doing the appropriate comparison in the consuming classes.

p.s. LoggingEvent has commented out methods xgetMDCCopy and 
XXgetMDCKeySet.  Are these going to be deleted or uncommented?

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org

Re: LoggingEvent.sequenceCount

Posted by Ceki Gülcü <ce...@qos.ch>.

At 06:11 AM 12/28/2004, Curt Arnold wrote:

>Also, Item 48 of Effective Java (Synchronize access to shared mutable
>data) uses an example which appears to be almost exactly the scenario
>in LoggingEvent and goes to great length why the code may not perform
>as expected.  Basically each thread is free to maintain its own copy of
>sequenceCount and is only required to synchronize its values at thread
>synchronization.

Thanks for pointing this out. sequenceCount is a static long. Just the
'long' part mandates synchronization. The fact that is shared (static)
is another reason. Anyway, thanks again. A fix has been committed.

>>>The sequenceNumber+timestamp combination makes it easy to implement
>>>equals() method for LoggingEvent.
>
>
>The implementation might be easy, but appears problematic.  The default
>implementation of equals and getHashCode would be preferred if all you
>cared about was object identity.

Object identity is not enough. Chainsaw as well as other receivers may
access a source of LoggingEvent multiple times. In that case, we need
an efficient way to discard duplicate LoggingEvents.

>If you really care about value
>equality, you are saying that only the class type, timestamp and
>sequence number are significant.  I could abuse the current
>implementation of equals by creating a LoggingEvent, cloning it (which
>could be done using serialization if need be) and changing the level
>and message of the clone, original.equals(clone) would still return
>true since the timestamp and sequence number are still identical.

A malevolent adversary could easily fool the current equals
method. However, that would only cause a non-equal event as being seen
as equal, hence discarded as a duplicate. Not a very threatening
attack if you ask me. More importantly, log4j does not pretend to deal
with malevolent adversaries. Nevertheless, I agree with you that we
should not leave ourselves wide-open for attack either.

>Also, without the synchronization issue on sequenceCount resolved,
>totally unrelated logging events from different threads may compare
>equal since they might have the same timestamp and sequence number.

The synchronization issue on sequenceCount should now be resolved.

>Is there a particular need to implement value comparison of logging
>events?  There don't appear to be any unit tests that depend on it.
>Deleting the equals and getHashCode implementation didn't make any test
>cases fail.  The coverage analysis indicated that LoggingEvent.equals
>was called 300386 times, but every one returned true since (this ==
>rObject) was true, but I didn't track down where LoggingEvent.equals
>was called.

The equals method comes into play when a receiver reads a source of
logging events multiple times. It's also useful in making selective
SQL queries.

-- 
Ceki Gülcü

   The complete log4j manual: http://www.qos.ch/log4j/

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org

Re: LoggingEvent.sequenceCount

Posted by Curt Arnold <ca...@apache.org>.

On Dec 27, 2004, at 3:59 PM, Ceki Gülcü wrote:
  (Switching order of comments)

 >What do you mean by "the value of sequenceCount may be slightly out of  
sync on different threads"?

http://java.sun.com/docs/books/vmspec/2nd-edition/html/ 
Concepts.doc.html#33308 (toward the end)
http://java.sun.com/docs/books/vmspec/2nd-edition/html/ 
Threads.doc.html#21294

The pertinent quotes are:

In the absence of explicit synchronization, an implementation is free  
to update the main memory in an order that may be surprising.  
Therefore, the programmer who prefers to avoid surprises should use  
explicit synchronization.

Best practice is that if a variable is ever to be assigned by one  
thread and used or assigned by another, then all accesses to that  
variable should be enclosed in synchronized methods or synchronized  
statements.

Also, Item 48 of Effective Java (Synchronize access to shared mutable  
data) uses an example which appears to be almost exactly the scenario  
in LoggingEvent and goes to great length why the code may not perform  
as expected.  Basically each thread is free to maintain its own copy of  
sequenceCount and is only required to synchronize its values at thread  
synchronization.

>>
>> The sequenceNumber+timestamp combination makes it easy to implement  
>> equals() method for LoggingEvent.

The implementation might be easy, but appears problematic.  The default  
implementation of equals and getHashCode would be preferred if all you  
cared about was object identity.  If you really care about value  
equality, you are saying that only the class type, timestamp and  
sequence number are significant.  I could abuse the current  
implementation of equals by creating a LoggingEvent, cloning it (which  
could be done using serialization if need be) and changing the level  
and message of the clone, original.equals(clone) would still return  
true since the timestamp and sequence number are still identical.

Also, without the synchronization issue on sequenceCount resolved,  
totally unrelated logging events from different threads may compare  
equal since they might have the same timestamp and sequence number.

Is there a particular need to implement value comparison of logging  
events?  There don't appear to be any unit tests that depend on it.   
Deleting the equals and getHashCode implementation didn't make any test  
cases fail.  The coverage analysis indicated that LoggingEvent.equals  
was called 300386 times, but every one returned true since (this ==  
rObject) was true, but I didn't track down where LoggingEvent.equals  
was called.

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org

Re: LoggingEvent.sequenceCount

Posted by Ceki Gülcü <ce...@qos.ch>.

The sequenceNumber+timestamp combination makes it easy to implement 
equals() method for LoggingEvent.

What do you mean by "the value of sequenceCount may be slightly out of sync 
on different threads"?

At 10:45 PM 12/27/2004, you wrote:
>I'm a bit concerned about LoggingEvent.sequenceNumber and 
>sequenceCount.  It appears from a code review that unless externally 
>synchronized, it would be possible to end up with multiple logging events 
>with the same sequence number since the value of sequenceCount may be 
>slightly out of sync on different threads.
>
>Before diving into addressing any issues around that, I'd like to know 
>more of the motivation for adding sequence number and implementing it on 
>LoggingEvent.   I did a simple search on the mailing list and didn't find 
>any messages that appeared to be on this topic.
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
>For additional commands, e-mail: log4j-dev-help@logging.apache.org

-- 
Ceki Gülcü

   The complete log4j manual: http://www.qos.ch/log4j/



---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org