You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4j-dev@logging.apache.org by Curt Arnold <ca...@apache.org> on 2004/12/27 22:45:43 UTC
LoggingEvent.sequenceCount
I'm a bit concerned about LoggingEvent.sequenceNumber and
sequenceCount. It appears from a code review that unless externally
synchronized, it would be possible to end up with multiple logging
events with the same sequence number since the value of sequenceCount
may be slightly out of sync on different threads.
Before diving into addressing any issues around that, I'd like to know
more of the motivation for adding sequence number and implementing it
on LoggingEvent. I did a simple search on the mailing list and didn't
find any messages that appeared to be on this topic.
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: LoggingEvent.sequenceCount
Posted by Endre Stølsvik <En...@Stolsvik.com>.
On Tue, 28 Dec 2004, Ceki Gülcü wrote:
| Anyway, a non-trivial amount of thinking went into the implementation of
| the equals() method. In the absence of a malevolent adversary, it is also
| correct although only probabilistically so. Its only downside is the
| addition of the static sequenceCount field and the sequenceNumber field in
| LoggingEvent.
How often is this number increased?
LoggingEvent sounds like a thing that is made for every log-invocation
that goes through (that is, that the log-call's level is above or equal to
the category's level, and these evaluations).
If that is so, then having a synchronized method -on the class- sounds
like a noticable synchronization impact, if you log much. A synchronized
block is in itself somewhat expensive, but in a multi threaded application
(e.g. Servlet Container), if two synchs happen to "hit" (e.g. on a
multi-CPU, or HT, or even on a normal preemptive multitasking OS), then
the contention time might become, as I've understood it, "very" high. Thus
it is interesting to avoid global synch-bottelnecks.
Isn't there some synchronization going on anyway when a log-invocation
goes through, on the Category and Appender levels? Couldn't this
LoggingEvent sequenced number be made at the same time, avoiding the
additional synch block? Maybe one could use some Category Id (if there is
such a thing, if not, maybe make it - this won't be an impact), and add a
category-local sequenced number (one of them multiplied by some 10^a
constant, to give two separate spaces). This would reduce the global
contention, at least it wouldn't introduce any more synchs.
Java 5.0 have synchronization contention monitoring accessible from within
the application - maybe some unit tests could be devised?
java.lang.management.ThreadMXBean.
PS: I agree that the name should be changed if it really shouldn't reflect
the sequence of the LoggingEvent, but merely is used as an equals-device.
Maybe call it logginEventUniqueId or something like this..
PPS: How much synchronization have gone into log4j? Ideally, I guess, it
should -only- synchronize on Logger.getLogger() and when actually doing
the appending?
Endre
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: LoggingEvent.sequenceCount
Posted by Ceki Gülcü <ce...@qos.ch>.
Anyway, a non-trivial amount of thinking went into the implementation of
the equals() method. In the absence of a malevolent adversary, it is also
correct although only probabilistically so. Its only downside is the
addition of the static sequenceCount field and the sequenceNumber field in
LoggingEvent.
Besides helping in the efficient implementation of the equals() method
(needed by Chainsaw), the sequenceNumber field in LoggingEvent also greatly
simplifies the implementation of receivers needing to repeatedly query a
data store, e.g. DBReceiver.
So imho sequenceCount provides real added value.
At 10:11 PM 12/28/2004, you wrote:
>At 09:49 PM 12/28/2004, Curt Arnold wrote:
>>The point that I was trying to make and which I still think is valid
>>despite my mistaken analysis of the hashCode function, is that it is
>>better to have Chainsaw to make whatever comparison it thinks is
>>appropriate for its filtering needs and avoid having to worry about
>>maintaining the equals contract and having a hashCode function that
>>results in good performance if LoggingEvents are added to a HashTable.
>
>--
>Ceki Gülcü
>
> The complete log4j manual: http://www.qos.ch/log4j/
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: LoggingEvent.sequenceCount
Posted by Ceki Gülcü <ce...@qos.ch>.
At 09:49 PM 12/28/2004, Curt Arnold wrote:
>The point that I was trying to make and which I still think is valid
>despite my mistaken analysis of the hashCode function, is that it is
>better to have Chainsaw to make whatever comparison it thinks is
>appropriate for its filtering needs and avoid having to worry about
>maintaining the equals contract and having a hashCode function that
>results in good performance if LoggingEvents are added to a HashTable.
Are you worried about having to implement LoggingEvent.equals() and
hashCode() in C++?
--
Ceki Gülcü
The complete log4j manual: http://www.qos.ch/log4j/
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: LoggingEvent.sequenceCount
Posted by Curt Arnold <ca...@apache.org>.
On Dec 28, 2004, at 2:06 PM, Ceki Gülcü wrote:
>>
>
> The code currently says:
>
> /**
> * The hashcode is computed as XOR of the lower 32 bits of
> sequenceNumber and
> * the higher 32 bits of timeStamp;
> */
> public int hashCode() {
> return (int)((timeStamp >> 32) ^ (sequenceNumber & 0xFFFFFFFF));
> }
>
> In what way does the comment not match the code?
Sorry, I misread the comment. Thought it said lower 32 bits of the
timestamp.
>
> The hashcode does not need to uniformly spread out the returned values
> over the range of integer. It just has to make a minimal effort to
> avoid that many values do not pile up on top of each other. Given that
> all hash tables algorithms compute the hash value over modulo 'n'
> where 'n' is a small prime number, for all practical purposes the
> following hashcode would have been amply sufficient.
>
> public int hashCode() {
> return (int)((sequenceNumber & 0xFFFFFFFF));
> }
>
> We are doing a little better by XORing with the higher 32 bits of the
> timestamp. (2 to 32 milliseconds is about 50 days.) A slightly better
> implementation would have been
>
>
> public int hashCode() {
> return (int)((timeStamp >> 20) ^ (sequenceNumber & 0xFFFFFFFF));
> }
>
> 2^20 millis = 1048576 millis or about 17 minutes.
The point that I was trying to make and which I still think is valid
despite my mistaken analysis of the hashCode function, is that it is
better to have Chainsaw to make whatever comparison it thinks is
appropriate for its filtering needs and avoid having to worry about
maintaining the equals contract and having a hashCode function that
results in good performance if LoggingEvents are added to a HashTable.
>
>
>> p.s. LoggingEvent has commented out methods xgetMDCCopy and
>> XXgetMDCKeySet. Are these going to be deleted or uncommented?
>
> They'll eventually be deleted.
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: LoggingEvent.sequenceCount
Posted by Ceki Gülcü <ce...@qos.ch>.
At 08:23 PM 12/28/2004, Curt Arnold wrote:
>On Dec 28, 2004, at 7:50 AM, Ceki Gülcü wrote:
>The sequenceNumber+timestamp combination makes it easy to implement
>>>>>
>>>>>equals() method for LoggingEvent.
>>>
>>>
>>>The implementation might be easy, but appears problematic. The default
>>>implementation of equals and getHashCode would be preferred if all you
>>>cared about was object identity.
>>
>>Object identity is not enough. Chainsaw as well as other receivers may
>>access a source of LoggingEvent multiple times. In that case, we need
>>an efficient way to discard duplicate LoggingEvents.
>
>It may be preferable to move the definition of "close enough for chainsaw"
>or "close enough for queries" out into the consuming classes. I'll have
>to dig into this later since I'm on the road and not really set up for
>this type of code analysis.
>
>If log4j 1.2.8 logging events are deserialized, it looks like they may all
>have the same sequence number so chainsaw might want to additional
>checks if the timestamps are the same and the sequenceNumber is the
>default value.
>
>By overriding equals, you also are required to override hashCode (which
>you do) which ideally distributes the hash values uniformly over the value
>space of int. By my reading, it looks like the hash code is an XOR of the
>high 32 bits of the timestamp (despite the comment) and the sequence
>number. If I'm reading it right, that looks like all logging events with
>a given sequenceNumber within a fairly large timeframe (4 million seconds,
>11 days or so) would have the same hash code. Even it it is fixed to use
>the lower 32-bits, the hash codes still would be fairly tightly
>clustered. We could write a better hashCode, I'm just pointing out one of
>the side effects of choosing to override equals.
The code currently says:
/**
* The hashcode is computed as XOR of the lower 32 bits of
sequenceNumber and
* the higher 32 bits of timeStamp;
*/
public int hashCode() {
return (int)((timeStamp >> 32) ^ (sequenceNumber & 0xFFFFFFFF));
}
In what way does the comment not match the code?
The hashcode does not need to uniformly spread out the returned values over
the range of integer. It just has to make a minimal effort to avoid that
many values do not pile up on top of each other. Given that all hash tables
algorithms compute the hash value over modulo 'n' where 'n' is a small
prime number, for all practical purposes the following hashcode would have
been amply sufficient.
public int hashCode() {
return (int)((sequenceNumber & 0xFFFFFFFF));
}
We are doing a little better by XORing with the higher 32 bits of the
timestamp. (2 to 32 milliseconds is about 50 days.) A slightly better
implementation would have been
public int hashCode() {
return (int)((timeStamp >> 20) ^ (sequenceNumber & 0xFFFFFFFF));
}
2^20 millis = 1048576 millis or about 17 minutes.
>p.s. LoggingEvent has commented out methods xgetMDCCopy and
>XXgetMDCKeySet. Are these going to be deleted or uncommented?
They'll eventually be deleted.
--
Ceki Gülcü
The complete log4j manual: http://www.qos.ch/log4j/
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: LoggingEvent.sequenceCount
Posted by Curt Arnold <ca...@apache.org>.
On Dec 28, 2004, at 7:50 AM, Ceki Gülcü wrote:
The sequenceNumber+timestamp combination makes it easy to implement
>>>>
>>>> equals() method for LoggingEvent.
>>
>>
>> The implementation might be easy, but appears problematic. The
>> default
>> implementation of equals and getHashCode would be preferred if all you
>> cared about was object identity.
>
> Object identity is not enough. Chainsaw as well as other receivers may
> access a source of LoggingEvent multiple times. In that case, we need
> an efficient way to discard duplicate LoggingEvents.
It may be preferable to move the definition of "close enough for
chainsaw" or "close enough for queries" out into the consuming classes.
I'll have to dig into this later since I'm on the road and not really
set up for this type of code analysis.
If log4j 1.2.8 logging events are deserialized, it looks like they may
all have the same sequence number so chainsaw might want to additional
checks if the timestamps are the same and the sequenceNumber is the
default value.
By overriding equals, you also are required to override hashCode (which
you do) which ideally distributes the hash values uniformly over the
value space of int. By my reading, it looks like the hash code is an
XOR of the high 32 bits of the timestamp (despite the comment) and the
sequence number. If I'm reading it right, that looks like all logging
events with a given sequenceNumber within a fairly large timeframe (4
million seconds, 11 days or so) would have the same hash code. Even it
it is fixed to use the lower 32-bits, the hash codes still would be
fairly tightly clustered. We could write a better hashCode, I'm just
pointing out one of the side effects of choosing to override equals.
All in all, I think overriding equals just gives us headaches we could
avoid by doing the appropriate comparison in the consuming classes.
p.s. LoggingEvent has commented out methods xgetMDCCopy and
XXgetMDCKeySet. Are these going to be deleted or uncommented?
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: LoggingEvent.sequenceCount
Posted by Ceki Gülcü <ce...@qos.ch>.
At 06:11 AM 12/28/2004, Curt Arnold wrote:
>Also, Item 48 of Effective Java (Synchronize access to shared mutable
>data) uses an example which appears to be almost exactly the scenario
>in LoggingEvent and goes to great length why the code may not perform
>as expected. Basically each thread is free to maintain its own copy of
>sequenceCount and is only required to synchronize its values at thread
>synchronization.
Thanks for pointing this out. sequenceCount is a static long. Just the
'long' part mandates synchronization. The fact that is shared (static)
is another reason. Anyway, thanks again. A fix has been committed.
>>>The sequenceNumber+timestamp combination makes it easy to implement
>>>equals() method for LoggingEvent.
>
>
>The implementation might be easy, but appears problematic. The default
>implementation of equals and getHashCode would be preferred if all you
>cared about was object identity.
Object identity is not enough. Chainsaw as well as other receivers may
access a source of LoggingEvent multiple times. In that case, we need
an efficient way to discard duplicate LoggingEvents.
>If you really care about value
>equality, you are saying that only the class type, timestamp and
>sequence number are significant. I could abuse the current
>implementation of equals by creating a LoggingEvent, cloning it (which
>could be done using serialization if need be) and changing the level
>and message of the clone, original.equals(clone) would still return
>true since the timestamp and sequence number are still identical.
A malevolent adversary could easily fool the current equals
method. However, that would only cause a non-equal event as being seen
as equal, hence discarded as a duplicate. Not a very threatening
attack if you ask me. More importantly, log4j does not pretend to deal
with malevolent adversaries. Nevertheless, I agree with you that we
should not leave ourselves wide-open for attack either.
>Also, without the synchronization issue on sequenceCount resolved,
>totally unrelated logging events from different threads may compare
>equal since they might have the same timestamp and sequence number.
The synchronization issue on sequenceCount should now be resolved.
>Is there a particular need to implement value comparison of logging
>events? There don't appear to be any unit tests that depend on it.
>Deleting the equals and getHashCode implementation didn't make any test
>cases fail. The coverage analysis indicated that LoggingEvent.equals
>was called 300386 times, but every one returned true since (this ==
>rObject) was true, but I didn't track down where LoggingEvent.equals
>was called.
The equals method comes into play when a receiver reads a source of
logging events multiple times. It's also useful in making selective
SQL queries.
--
Ceki Gülcü
The complete log4j manual: http://www.qos.ch/log4j/
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: LoggingEvent.sequenceCount
Posted by Curt Arnold <ca...@apache.org>.
On Dec 27, 2004, at 3:59 PM, Ceki Gülcü wrote:
(Switching order of comments)
>What do you mean by "the value of sequenceCount may be slightly out of
sync on different threads"?
http://java.sun.com/docs/books/vmspec/2nd-edition/html/
Concepts.doc.html#33308 (toward the end)
http://java.sun.com/docs/books/vmspec/2nd-edition/html/
Threads.doc.html#21294
The pertinent quotes are:
In the absence of explicit synchronization, an implementation is free
to update the main memory in an order that may be surprising.
Therefore, the programmer who prefers to avoid surprises should use
explicit synchronization.
Best practice is that if a variable is ever to be assigned by one
thread and used or assigned by another, then all accesses to that
variable should be enclosed in synchronized methods or synchronized
statements.
Also, Item 48 of Effective Java (Synchronize access to shared mutable
data) uses an example which appears to be almost exactly the scenario
in LoggingEvent and goes to great length why the code may not perform
as expected. Basically each thread is free to maintain its own copy of
sequenceCount and is only required to synchronize its values at thread
synchronization.
>>
>> The sequenceNumber+timestamp combination makes it easy to implement
>> equals() method for LoggingEvent.
The implementation might be easy, but appears problematic. The default
implementation of equals and getHashCode would be preferred if all you
cared about was object identity. If you really care about value
equality, you are saying that only the class type, timestamp and
sequence number are significant. I could abuse the current
implementation of equals by creating a LoggingEvent, cloning it (which
could be done using serialization if need be) and changing the level
and message of the clone, original.equals(clone) would still return
true since the timestamp and sequence number are still identical.
Also, without the synchronization issue on sequenceCount resolved,
totally unrelated logging events from different threads may compare
equal since they might have the same timestamp and sequence number.
Is there a particular need to implement value comparison of logging
events? There don't appear to be any unit tests that depend on it.
Deleting the equals and getHashCode implementation didn't make any test
cases fail. The coverage analysis indicated that LoggingEvent.equals
was called 300386 times, but every one returned true since (this ==
rObject) was true, but I didn't track down where LoggingEvent.equals
was called.
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: LoggingEvent.sequenceCount
Posted by Ceki Gülcü <ce...@qos.ch>.
The sequenceNumber+timestamp combination makes it easy to implement
equals() method for LoggingEvent.
What do you mean by "the value of sequenceCount may be slightly out of sync
on different threads"?
At 10:45 PM 12/27/2004, you wrote:
>I'm a bit concerned about LoggingEvent.sequenceNumber and
>sequenceCount. It appears from a code review that unless externally
>synchronized, it would be possible to end up with multiple logging events
>with the same sequence number since the value of sequenceCount may be
>slightly out of sync on different threads.
>
>Before diving into addressing any issues around that, I'd like to know
>more of the motivation for adding sequence number and implementing it on
>LoggingEvent. I did a simple search on the mailing list and didn't find
>any messages that appeared to be on this topic.
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
>For additional commands, e-mail: log4j-dev-help@logging.apache.org
--
Ceki Gülcü
The complete log4j manual: http://www.qos.ch/log4j/
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org