You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4j-dev@logging.apache.org by Elias Ross <er...@m-qube.com> on 2005/01/28 03:00:05 UTC
Why concurrency?
http://www.gotw.ca/publications/concurrency-ddj.htm
>>From the article:
> Concurrency is the next major revolution in how we write software.
> Different experts still have different opinions on whether it will be
> bigger than OO, but that kind of conversation is best left to pundits.
> For technologists, the interesting thing is that concurrency is of the
> same order as OO both in the (expected) scale of the revolution and in
> the complexity and learning curve of the technology.
[...]
> Benefits and Costs of Concurrency
>
> There are two major reasons for which concurrency, especially
> multi-threading, is already used in mainstream software. The first is
> to logically separate naturally independent control flows; for
> example, in a database replication server I designed it was natural to
> put each replication session on its own thread, because each session
> worked completely independently of any others that might be active (as
> long as they weren’t working on the same database row). The second and
> less common reason to write concurrent code in the past has been for
> performance, either to scalably take advantage of multiple physical
> CPUs or to easily take advantage of latency in other parts of the
> application; in my database replication server, this factor applied as
> well and the separate threads were able to scale well on multiple CPUs
> as our server handled more and more concurrent replication sessions
> with many other servers.
As an aside, I was wondering if anybody looked at the code I posted the
other day?
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: Why concurrency?
Posted by Ceki Gülcü <ce...@qos.ch>.
At 08:45 PM 1/28/2005, Curt Arnold wrote:
>On Jan 27, 2005, at 8:00 PM, Elias Ross wrote:
>
>>
>>http://www.gotw.ca/publications/concurrency-ddj.htm
>
>I think that we are all fans of concurrency here. Any hesitancy is due to
>caution or overwork, not interest.
+1
--
Ceki Gülcü
The complete log4j manual: http://www.qos.ch/log4j/
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: Sequence count, WAS: Why concurrency?
Posted by Elias Ross <er...@m-qube.com>.
On Mon, 2005-01-31 at 16:27, Curt Arnold wrote:
> and these messages were dispatched to two distinct appenders (possibly
> of different types) and that one instance of Chainsaw was listening to
> both, it could determine that which logging events came from the same
> iteration of the loop.
An assign sequence number method could first check if a sequence number
was already assigned:
class LoggingEvent ... {
public synchronized void assignNewSequenceNumber(long count) {
if (this.sequenceNumber != 0)
return;
this.sequenceNumber = count;
}
}
class ChainsawAppender {
long seq = 0;
public synchronized void doAppend(LoggingEvent le) {
le.assignSequenceNumber(seq++);
}
}
Or:
class LoggingEvent ... {
static long count;
private synchronized void writeObject() {
if (this.sequenceNumber == 0) {
synchronized (LoggingEvent.class) {
this.sequenceNumber = count++;
}
}
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: Sequence count, WAS: Why concurrency?
Posted by Curt Arnold <ca...@apache.org>.
On Jan 31, 2005, at 5:19 PM, Elias Ross wrote:
> On Mon, 2005-01-31 at 14:53, Curt Arnold wrote:
>
>> So I think that users
>> will just have to accept that messages in the log from a multithreaded
>> application may appear in a slightly different sequence than the
>> requests were made.
>
> Which is sort of the reasoning behind having the appender generate the
> sequence number -- not the class itself. Then, you be sure that events
> appearing in a file would appear in the correct order. This would be
> part of the contract for Appender.doAppend.
>
> For appenders that don't have any reason to generate the sequence
> number
> (WriterAppender, et all.) you don't generate one. ChainsawAppender and
> SocketAppender would create one.
>
If you go back to the earlier thread on sequence number when I was
trying to find out the motivation behind it, I speculated that it might
have been added to allow detection of dropped messages. If that had
been the case, then an appender based sequence number might have been a
reasonable approach. However that was not the motivation. The
motivation was to have some sort of object ID so you could tell the
difference between two serialized forms of logging events that were
identical in all aspects (message, timestamp, etc) except arose from
distinct requests. For example, if I had a very tight loop like
(assuming each iteration) takes less than 1 ms:
for (int i = 0; i < 10; i++) {
logger.debug("Hello, world");
}
and these messages were dispatched to two distinct appenders (possibly
of different types) and that one instance of Chainsaw was listening to
both, it could determine that which logging events came from the same
iteration of the loop.
Even if the sequenceCount was done on the appender level, it still does
not solve the sequence indeterminacy issue since there would need to be
a synchronization block on its sequenceCount which could still result
in the first request to block on the lock possibly not being the first
to acquire the lock when it becomes available.
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: Sequence count, WAS: Why concurrency?
Posted by Elias Ross <er...@m-qube.com>.
On Mon, 2005-01-31 at 14:53, Curt Arnold wrote:
> So I think that users
> will just have to accept that messages in the log from a multithreaded
> application may appear in a slightly different sequence than the
> requests were made.
Which is sort of the reasoning behind having the appender generate the
sequence number -- not the class itself. Then, you be sure that events
appearing in a file would appear in the correct order. This would be
part of the contract for Appender.doAppend.
For appenders that don't have any reason to generate the sequence number
(WriterAppender, et all.) you don't generate one. ChainsawAppender and
SocketAppender would create one.
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: Sequence count, WAS: Why concurrency?
Posted by Curt Arnold <ca...@apache.org>.
On Jan 31, 2005, at 3:11 PM, Ceki Gülcü wrote:
> At 09:51 PM 1/31/2005, Curt Arnold wrote:
> Recently, I stumbled upon an interesting use case for sequenceCount.
> When the AsynAppender buffer is full and a event is processed, one of
> the waiting threads is notified that a slot became available. The
> order in which threads are notified is not specified by the Java
> language. They can be notified more or less in random order.
>
> Consequently, events may be output out of order by AsynAppender.
> Sequence count can help the user to order the output.
>
> The above should be viewed as a marginally important use case for
> sequenceCount, but a use case it is nonetheless.
>
I think the scenario described would affect any appender that needs to
synchronize a shared resource (like the event queue, socket, file etc)
which would include almost all appenders. In the case of an asych
appender, there would be an opportunity to reorder events into request
time instead of dispatch time while in the queue, however you would not
have that opportunity with say a SocketAppender. So I think that users
will just have to accept that messages in the log from a multithreaded
application may appear in a slightly different sequence than the
requests were made.
Then the question becomes what to do if the millisecond timestamp is
not sufficiently precise for the analytical need. Use of a
high-performance timer (like System.nanoTime) might be a better
solution than getSequenceCount. However, if you really need to know
the precise timeline of a multithreaded app, it might be something that
you might have to handle in the calling application. Even without a
synchronization issue, it would be possible for two nearly simultaneous
calls to logger.debug to have the LoggingEvent constructor (where the
timestamp is determined) in a different order since the cost of walking
the tree may vary and the thread priorities may be different.
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: Sequence count, WAS: Why concurrency?
Posted by Endre Stølsvik <En...@Stolsvik.com>.
|
| Recently, I stumbled upon an interesting use case for sequenceCount. When
| the AsynAppender buffer is full and a event is processed, one of the
| waiting threads is notified that a slot became available. The order in
| which threads are notified is not specified by the Java language. They can
| be notified more or less in random order.
|
| Consequently, events may be output out of order by AsynAppender. Sequence
| count can help the user to order the output.
|
| The above should be viewed as a marginally important use case for
| sequenceCount, but a use case it is nonetheless.
This should be done with a "waiting list" pattern, rather than relying on
this specific "feature" of the actual LoggingEvent. I feel this rather
clearly points out the high synchronization that obviously must have gone
into the sequenceCount when it later can be used to order the actual
output of the LoggingEvents.
Endre
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Sequence count, WAS: Why concurrency?
Posted by Ceki Gülcü <ce...@qos.ch>.
At 09:51 PM 1/31/2005, Curt Arnold wrote:
>>>Sorry, "getSequenceCount" was a short-hand reference to a fairly
>>>involved thread about a month ago. A sequence number is currently
>>>being used as a logging event identifier, but there are likely better
>>>ways of meeting the actual requirements. The thread is pretty long and
>>>convoluted, most of it was trying to figure out what the actual
>>>requirements were. Having a monotonically increasing message sequence
>>>number wasn't one of them. Having an implementation of equals and
>>>hasCode were requirements (from the Chainsaw team).
>>
>>I don't see why the Chainsaw team is causing grief for you. If their
>>particular application requires "equals" and "hashCode", they can
>>implement their own wrappers as appropriate for logging events.
>>
>>It appears their requirements are negatively impacting the base
>>architecture of log4j. I'm sure there is a reasonable compromise to be
>>made.
>
>At this point, they have expressed their preference to have equals and
>hashCode with value semantics and have code that depends its presence.
>At this point, I do not have compelling evidence that it is a bad thing
>and its best to proceed under the assumption that they knew their
>requirements. Adding wrappers to Chainsaw would cause an increase in
>total complexity by removing some complexity from the log4j core but
>adding more in Chainsaw.
Independently of Chainsaw's requirements, adding equals and hashCode to
LoggingEvent is a good thing because equals/hashCode also make writing
log4j test cases easier.
The fact that sequenceCount was introduced as a helper data field for the
equals and hashCode methods was *my* choice of optimization. With
hindsight, we can all agree that it was a poor optimization. Future
implementations of equals and hashCode may not rely on sequenceCount (or
eventId) as heavily.
Recently, I stumbled upon an interesting use case for sequenceCount. When
the AsynAppender buffer is full and a event is processed, one of the
waiting threads is notified that a slot became available. The order in
which threads are notified is not specified by the Java language. They can
be notified more or less in random order.
Consequently, events may be output out of order by AsynAppender. Sequence
count can help the user to order the output.
The above should be viewed as a marginally important use case for
sequenceCount, but a use case it is nonetheless.
--
Ceki Gülcü
The complete log4j manual: http://www.qos.ch/log4j/
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: Why concurrency?
Posted by Curt Arnold <ca...@apache.org>.
On Jan 31, 2005, at 12:33 PM, Elias Ross wrote:
> On Sat, 2005-01-29 at 01:34, Curt Arnold wrote:
>
> How would my patch break the build or their build?
>
> Assuming I don't change any existing code, only add new classes and
> methods, this wouldn't force any changes on this other projects.
>
I've admitted that I have not had chance to review your code. Even if
it were totally isolated and would not affect existing users (which I
would still need to confirm), there is still effort required that the
files contain the appropriate license notices, have unit tests, etc.
Basically code doesn't go in until a committer has reviewed it and we
are all finite beings typically working on our own time. We are
grateful for the effort behind your contributions, but the fact we
haven't reviewed and integrated it within a week or two of submission
doesn't bother me.
>> Currently all our performance tests are single threaded. In this
>> case,
>> we need a mock multithreaded application that will dispatch log
>> messages to identify contentions. There are a variety of tools that
>> can be used to collect and analyze the system under test, but you need
>> to frame the benchmark problem first.
>
> I included a mock application to perform some simple benchmarking with
> my patch, did you not see it? This is how I measured the large amount
> of contention in the current 1.3 code base. It also showed about a 10%
> increase in performance on multi-processor machines.
I thought that there was some concern that your sample application was
not representative. So maybe the first step is to review your mock
application instead of writing one.
>
>> Sorry, "getSequenceCount" was a short-hand reference to a fairly
>> involved thread about a month ago. A sequence number is currently
>> being used as a logging event identifier, but there are likely better
>> ways of meeting the actual requirements. The thread is pretty long
>> and
>> convoluted, most of it was trying to figure out what the actual
>> requirements were. Having a monotonically increasing message sequence
>> number wasn't one of them. Having an implementation of equals and
>> hasCode were requirements (from the Chainsaw team).
>
> I don't see why the Chainsaw team is causing grief for you. If their
> particular application requires "equals" and "hashCode", they can
> implement their own wrappers as appropriate for logging events.
>
> It appears their requirements are negatively impacting the base
> architecture of log4j. I'm sure there is a reasonable compromise to be
> made.
>
At this point, they have expressed their preference to have equals and
hashCode with value semantics and have code that depends its presence.
At this point, I do not have compelling evidence that it is a bad thing
and its best to proceed under the assumption that they knew their
requirements. Adding wrappers to Chainsaw would cause an increase in
total complexity by removing some complexity from the log4j core but
adding more in Chainsaw.
>> Sounds like the "Doctor. it hurts me when I do this" joke. There are
>> certain operations (like retrieving location information) that are
>> known to be expensive. log4j provides mechanisms to get the info if
>> you really want it. The cost is high, but avoidable.
>
> Did you look at the code? ThrowableInformation.extractStringRep is
> called when you do
>
> Logger.debug("", new Throwable());
And you expect that to be a high frequency call? Exception handling
code is expensive in general and should only occur rarely. Obviously
everything should be as fast as possible, but the priority has to be on
optimizing the high-frequency cases. If you are seeing 20% of your
time in spent in logging exception stack traces, then it likely that
another 20% is spent in the overhead of raising and catching
exceptions. If this was a real application, I would serious look at
your use of exceptions. If this was a mock application, I'd be
concerned that it isn't representative of actual use.
>
> Is there any particular reason why for JDK 1.4 reflection is used to
> print the stack trace?
>
Someone else will have to answer that.
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: Why concurrency?
Posted by Elias Ross <er...@m-qube.com>.
On Sat, 2005-01-29 at 01:34, Curt Arnold wrote:
> Sorry, A committer who blindly checked in code they hadn't reviewed
> would likely not remain a committer for long. Actually, they would
> more likely get very hostile emails. If anyone in the community is
> just dying to use your patches, there is nothing to stop them from
> merging it into their working builds. However, by committing it into
> the CVS, I'm at least forcing the changes on the 40 projects or so in
> Gump that build against our nightlies.
How would my patch break the build or their build?
Assuming I don't change any existing code, only add new classes and
methods, this wouldn't force any changes on this other projects.
> Currently all our performance tests are single threaded. In this case,
> we need a mock multithreaded application that will dispatch log
> messages to identify contentions. There are a variety of tools that
> can be used to collect and analyze the system under test, but you need
> to frame the benchmark problem first.
I included a mock application to perform some simple benchmarking with
my patch, did you not see it? This is how I measured the large amount
of contention in the current 1.3 code base. It also showed about a 10%
increase in performance on multi-processor machines.
> Sorry, "getSequenceCount" was a short-hand reference to a fairly
> involved thread about a month ago. A sequence number is currently
> being used as a logging event identifier, but there are likely better
> ways of meeting the actual requirements. The thread is pretty long and
> convoluted, most of it was trying to figure out what the actual
> requirements were. Having a monotonically increasing message sequence
> number wasn't one of them. Having an implementation of equals and
> hasCode were requirements (from the Chainsaw team).
I don't see why the Chainsaw team is causing grief for you. If their
particular application requires "equals" and "hashCode", they can
implement their own wrappers as appropriate for logging events.
It appears their requirements are negatively impacting the base
architecture of log4j. I'm sure there is a reasonable compromise to be
made.
> Sounds like the "Doctor. it hurts me when I do this" joke. There are
> certain operations (like retrieving location information) that are
> known to be expensive. log4j provides mechanisms to get the info if
> you really want it. The cost is high, but avoidable.
Did you look at the code? ThrowableInformation.extractStringRep is
called when you do
Logger.debug("", new Throwable());
Is there any particular reason why for JDK 1.4 reflection is used to
print the stack trace?
Thanks.
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: Why concurrency?
Posted by Curt Arnold <ca...@apache.org>.
On Jan 28, 2005, at 6:30 PM, Elias Ross wrote:
> On Fri, 2005-01-28 at 11:45, Curt Arnold wrote:
>
> Would it be possible to check in the code as-is and if there is any
> future interest (or disinterest) incorporate or remove as appropriate?
>
Sorry, A committer who blindly checked in code they hadn't reviewed
would likely not remain a committer for long. Actually, they would
more likely get very hostile emails. If anyone in the community is
just dying to use your patches, there is nothing to stop them from
merging it into their working builds. However, by committing it into
the CVS, I'm at least forcing the changes on the 40 projects or so in
Gump that build against our nightlies.
>> I haven't, but I have been following the discussion. I think the
>> initial requirement before considering integrating any of the code is
>> to have a reasonable multi-thread performance test. Then likely we'd
>> look at each aspect in your code along with other queued concurrency
>> related issues like the getSequenceCount issue.
>
> Unfortunately, you can't get much information out of "for loops" and
> "System.currentTimeMillis()" with respect to thread contention.
> Usually, you write a class that exercises the code, then exercise that
> test on a multi-threaded machine, and use a tool that can properly
> evaluate the contention and time spent in your program.
>
> There are several free programs that can help. Commercial tools are
> probably ideal, but a proper tool is 10X better for this sort of task
> than taking in-JVM timings.
Currently all our performance tests are single threaded. In this case,
we need a mock multithreaded application that will dispatch log
messages to identify contentions. There are a variety of tools that
can be used to collect and analyze the system under test, but you need
to frame the benchmark problem first.
>
>
> Regarding the getSequenceCount, this is what I did for a library I
> worked on that had messages with sequence numbers:
Sorry, "getSequenceCount" was a short-hand reference to a fairly
involved thread about a month ago. A sequence number is currently
being used as a logging event identifier, but there are likely better
ways of meeting the actual requirements. The thread is pretty long and
convoluted, most of it was trying to figure out what the actual
requirements were. Having a monotonically increasing message sequence
number wasn't one of them. Having an implementation of equals and
hasCode were requirements (from the Chainsaw team).
> Again, if you really care about performance or evaluation of theories,
> you can spend some time (ultimately less time) with profiling tools.
>
> For example, one slow spot I found was:
>
> org.apache.log4j.spi.ThrowableInformation.extractStringRep()
>
> This class iterates through the methods on every Throwable, using quite
> a lot of CPU (10-20%). This seems like a good feature for 1.3 users,
> but perhaps unnecessary on JDK 1.4? Plus, most people who implement
> getCause also have a printStackTrace that does the right thing.
>
>
Sounds like the "Doctor. it hurts me when I do this" joke. There are
certain operations (like retrieving location information) that are
known to be expensive. log4j provides mechanisms to get the info if
you really want it. The cost is high, but avoidable.
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: Why concurrency?
Posted by Elias Ross <er...@m-qube.com>.
On Fri, 2005-01-28 at 11:45, Curt Arnold wrote:
> I think that we are all fans of concurrency here. Any hesitancy is due
> to caution or overwork, not interest.
Thanks for taking the time to reply. I don't want to interfere with any
high priority items for the log4j team at this time. Over e-mail, it's
hard to tell if people are disinterested or just otherwise occupied.
Would it be possible to check in the code as-is and if there is any
future interest (or disinterest) incorporate or remove as appropriate?
> I haven't, but I have been following the discussion. I think the
> initial requirement before considering integrating any of the code is
> to have a reasonable multi-thread performance test. Then likely we'd
> look at each aspect in your code along with other queued concurrency
> related issues like the getSequenceCount issue.
Unfortunately, you can't get much information out of "for loops" and
"System.currentTimeMillis()" with respect to thread contention.
Usually, you write a class that exercises the code, then exercise that
test on a multi-threaded machine, and use a tool that can properly
evaluate the contention and time spent in your program.
There are several free programs that can help. Commercial tools are
probably ideal, but a proper tool is 10X better for this sort of task
than taking in-JVM timings.
Regarding the getSequenceCount, this is what I did for a library I
worked on that had messages with sequence numbers:
0. Don't use a "class lock" -- if you can avoid them. Usually you have
multiple class-loaders anyway.
1. Assign a sequence number when serializing. Since the event really
doesn't exist until it is stored, there's no point in generating one
unless the objects are stored. (In memory, the same Event exists as the
same object.)
So, it's really the storage/stream class that should have the
responsibility of generating the sequence number. If you are
considering storing a bunch of files, then each event, plus a marker
indicating when you opened the stream indicates the uniqueness of the
event.
3. Don't implement equals and hashCode: This allows the packet itself
to avoid the issue. The issue of equality really is external to the
class.
4. If you want to compare two events, use a wrapper that compares the
marker of the file/stream plus the sequence number of the event.
5. If you're outputting the same event to two different streams, then
mark both streams with the same marker. (And don't reassign a sequence
number once set.)
(Note: java.util.logging.LogRecord doesn't implement equals or
hashCode. It does have a sequence ID, but I think it's pretty pointless
to have one unless you serialize the object.)
> I did look at Layout and its only concurrency issue seemed to be the
> convenience method "String format(LoggingEvent)" which is only used in
> a few places in log4j itself and a bit more in Chainsaw. Instead of
> doing much else to the class, I'd be in favor of either removing the
> method or marking it deprecated and have it allocate a new
> CharArrayWriter on each call (if it is deprecated it doesn't have to be
> fast, just safe and there).
Actually, I prefer to have the "String format(..)" method as you can
format the message without having to lock the output stream. First you
format, then you lock, then you write. (Rather than lock,
format+write.)
Again, if you really care about performance or evaluation of theories,
you can spend some time (ultimately less time) with profiling tools.
For example, one slow spot I found was:
org.apache.log4j.spi.ThrowableInformation.extractStringRep()
This class iterates through the methods on every Throwable, using quite
a lot of CPU (10-20%). This seems like a good feature for 1.3 users,
but perhaps unnecessary on JDK 1.4? Plus, most people who implement
getCause also have a printStackTrace that does the right thing.
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org
Re: Why concurrency?
Posted by Curt Arnold <ca...@apache.org>.
On Jan 27, 2005, at 8:00 PM, Elias Ross wrote:
>
> http://www.gotw.ca/publications/concurrency-ddj.htm
>
I think that we are all fans of concurrency here. Any hesitancy is due
to caution or overwork, not interest.
> As an aside, I was wondering if anybody looked at the code I posted the
> other day?
>
I haven't, but I have been following the discussion. I think the
initial requirement before considering integrating any of the code is
to have a reasonable multi-thread performance test. Then likely we'd
look at each aspect in your code along with other queued concurrency
related issues like the getSequenceCount issue.
I did look at Layout and its only concurrency issue seemed to be the
convenience method "String format(LoggingEvent)" which is only used in
a few places in log4j itself and a bit more in Chainsaw. Instead of
doing much else to the class, I'd be in favor of either removing the
method or marking it deprecated and have it allocate a new
CharArrayWriter on each call (if it is deprecated it doesn't have to be
fast, just safe and there).
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org