You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@logging.apache.org by Ralph Goers <ra...@dslextreme.com> on 2019/07/02 21:21:50 UTC

Location performance

I have made changes to the code the master branch to measure and improve performance. I have committed some of the changes but not the last set. I have not made the latest changes to release-2.x because they will have to be implemented a bit differently to avoid compatibility issues. The table below summarizes what I am seeing, at least on my MacBook Pro.  This test just basically writes 1.5 million log statements in a loop using an equivalent format for each logging framework. Note that JUL is faster with location information than without primarily because a custom Handler had to be created to format the log record and it performs better than the one used to log without location information.



There are several things I should note about this:
	• The first step, which is in the PR, has StackLocator.calcLocation modified to use a faster algorithm. Streams are not particularly fast. This resulted in some gains not shown here but can be seen in https://issues.apache.org/jira/browse/LOG4J2-2644
	• Performance is proportional to the number of stack traces that need to be traversed. JUL only had 11 whereas Log4j had 35 before it called calcLocation. Our stack is very deep because AbstractLogger and LoggerConfig have been overly refactored along with having to go though several intermediate objects.
	• I introduced a requiredLocation() method throughout the stack. Any component that requires location information sets this value to true. Based on this, the call to calcLocation can be made significantly higher up the stack just before the log event is created but only when a configured component will use it - with a few exceptions. This code is not yet checked in.
	• If for some reason the location information isn’t calculated when the event is constructed it can still be lazily populated later using the current, slower mechanism.

As you can see these changes result in a pretty dramatic improvement, almost to the point that the cost of obtaining location information doesn’t really matter - especially in Java 11. The results in Java 11 amount to 5us per logging call with no location vs 7.5us per logging call with location info. These results could probably be improved even more if AbstractLogger was “flattened” a bit.

I should point out that the logic isn’t perfect. I cannot determine whether location information is required or not for the RoutingAppender since it dynamically creates Appenders at run time. Instead, I added a “requiresLocation” attribute that can be set to indicate that. 

However, in testing I have encountered a unit test failure in RandomAccessFileAppenderTest. It is failing because the location information is being included even though includeLocation is set to false. I am wondering if that setting needs to be respected since the pattern specifies including it. It seems odd to me to not include it in that case.

Thoughts?

Ralph

Re: Location performance

Posted by Ralph Goers <ra...@dslextreme.com>.
Well, I just noticed that GcFreeSynchronousLoggingTest is generating a bunch of errors. It seems some of the patterns in gcFreeLogging.xml have location info in the patterns so StackTraceElements are being created.  It would probably be simpler to honor the flag then try to reconcile the issues.

Ralph

> On Jul 2, 2019, at 3:47 PM, Remko Popma <re...@gmail.com> wrote:
> 
> If memory serves correctly, the `includeLocation` was introduced because it was necessary for async loggers (and it seemed not feasible to get this information from the layout pattern). 
> 
> Async loggers need to know upfront whether they need to pass location information to the consumer thread. 
> 
> If obtaining location information becomes so cheap that it’s essentially free, it is fine to ignore the `includeLocation` directive and include location information anyway. 
> 
> About the performance results: what’s the difference between “Log4j 3.0 with class and method” and “Log4j 3.0 withLocation()”?
> 
> Also I am generally cautious of custom benchmarks; any reason not to use JMH?
> 
> Overall impressive results, congratulations!
> 
> Remko.
> 
> (Shameless plug) Every java main() method deserves http://picocli.info
> 
>> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com> wrote:
>> 
>> But what do you think about includeLocation. Should it override the result of requiresLocation or should the fact that we determined location information is required override the includeLocation setting?
>> 
>> Ralph
>> 
>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
>>> 
>>> These benchmarks are impressive! Moving the point of call site calculation is a great idea.
>>> 
>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com <ma...@dslextreme.com>> wrote:
>>> I have made changes to the code the master branch to measure and improve performance. I have committed some of the changes but not the last set. I have not made the latest changes to release-2.x because they will have to be implemented a bit differently to avoid compatibility issues. The table below summarizes what I am seeing, at least on my MacBook Pro.  This test just basically writes 1.5 million log statements in a loop using an equivalent format for each logging framework. Note that JUL is faster with location information than without primarily because a custom Handler had to be created to format the log record and it performs better than the one used to log without location information.
>>> 
>>> 
>>> 
>>> There are several things I should note about this:
>>>   • The first step, which is in the PR, has StackLocator.calcLocation modified to use a faster algorithm. Streams are not particularly fast. This resulted in some gains not shown here but can be seen in https://issues.apache.org/jira/browse/LOG4J2-2644 <https://issues.apache.org/jira/browse/LOG4J2-2644>
>>>   • Performance is proportional to the number of stack traces that need to be traversed. JUL only had 11 whereas Log4j had 35 before it called calcLocation. Our stack is very deep because AbstractLogger and LoggerConfig have been overly refactored along with having to go though several intermediate objects.
>>>   • I introduced a requiredLocation() method throughout the stack. Any component that requires location information sets this value to true. Based on this, the call to calcLocation can be made significantly higher up the stack just before the log event is created but only when a configured component will use it - with a few exceptions. This code is not yet checked in.
>>>   • If for some reason the location information isn’t calculated when the event is constructed it can still be lazily populated later using the current, slower mechanism.
>>> 
>>> As you can see these changes result in a pretty dramatic improvement, almost to the point that the cost of obtaining location information doesn’t really matter - especially in Java 11. The results in Java 11 amount to 5us per logging call with no location vs 7.5us per logging call with location info. These results could probably be improved even more if AbstractLogger was “flattened” a bit.
>>> 
>>> I should point out that the logic isn’t perfect. I cannot determine whether location information is required or not for the RoutingAppender since it dynamically creates Appenders at run time. Instead, I added a “requiresLocation” attribute that can be set to indicate that. 
>>> 
>>> However, in testing I have encountered a unit test failure in RandomAccessFileAppenderTest. It is failing because the location information is being included even though includeLocation is set to false. I am wondering if that setting needs to be respected since the pattern specifies including it. It seems odd to me to not include it in that case.
>>> 
>>> Thoughts?
>>> 
>>> Ralph
>>> 
>>> 
>>> -- 
>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
>> 
> 



Re: Location performance

Posted by Apache <ra...@dslextreme.com>.
Yes, that was a very interesting read and describes very well how JMH works.

Ralph

> On Jul 7, 2019, at 4:30 AM, Remko Popma <re...@gmail.com> wrote:
> 
> This may be of interest:
> https://www.researchgate.net/publication/333825812_What's_Wrong_With_My_Benchmark_Results_Studying_Bad_Practices_in_JMH_Benchmarks
> 
> 
> 
>> On Jul 6, 2019, at 5:49, Raman Gupta <ro...@gmail.com> wrote:
>> 
>> I'm no JMH expert... but, there is a lot more going on there than just
>> iterations to avoid the extra time for JIT compilation on warmup.
>> Checkout https://shipilev.net/talks/devoxx-Nov2013-benchmarking.pdf
>> for an idea of the kinds of problems naive timer-only benchmarks
>> encounter, and the different ways in which JMH solves them. There are
>> other good presentations about JMH at https://shipilev.net/ as well.
>> 
>> Secondly, JMH isn't de-optimizing the JVM (a la client JVM)
>> willy-nilly. It just provides an API that allows you to measure what
>> you (hopefully) think you are measuring e.g. blackholes to consume
>> results that would otherwise be optimized away by the JVM because the
>> result would otherwise be unused, and so forth. AFAIK,  it doesn't
>> apply these "de-optimizations" on its own -- it just provides an API
>> for you to use as needed.
>> 
>> Thirdly, I don't believe JMH does anything (by default) to optimize
>> away GC or anything like that -- it in fact allows you to consider the
>> affect of GC by running lots of iterations, so that the affect of GC
>> *is* taken into account in the benchmark. It does have a (non-default)
>> option to run a GC before each iteration for specific use cases (e.g.
>> benchmarks which generate lots of garbage). It also has a GC profiler
>> to understand the impact of GC on your results (see
>> https://github.com/Valloric/jmh-playground/blob/master/src/jmh/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java).
>> 
>> Regards,
>> Raman
>> 
>> 
>>> On Fri, Jul 5, 2019 at 4:06 PM Ralph Goers <ra...@dslextreme.com> wrote:
>>> 
>>> Actually, I would disagree. First, I don’t think the first thousand or slow iterations running slow would matter much with 1.5 million iterations. Lets say the first thousand take 10 times longer the the rest of the iterations. That is still only the equivalent of 10,000 iterations. That is a blip in 1.5 million.
>>> Second, measuring the impact of garbage collection should be a part of the benchmark if your code is generating garbage.
>>> 
>>> Again, I am not saying JMH doesn’t have its place, but I have to think something weird is going on when JMH is showing the no location benchmark to be about 4 times faster than the location benchmark where the simplistic loop test shows the elapsed time when including the location information to be about twice that of not including. I can tell that the call to StackWalker isn’t being optimized away because it still shows up in YourKit as the most expensive operation. And when I turn on immediateFlush flushing the file shows up in YourKit
>>> 
>>> Ralph
>>> 
>>>> On Jul 5, 2019, at 12:18 PM, Matt Sicker <bo...@gmail.com> wrote:
>>>> 
>>>> What I mean is that JMH is written to avoid including
>>>> behind-the-scenes variability of the code execution that changes from
>>>> startup until the code has been compiled to native code, garbage
>>>> collection, synchronization points for threads (safe points), etc. All
>>>> these things become variable noise that affects benchmarks when the
>>>> code you're testing executes in an especially short period of time
>>>> (e.g., appending a log event is pretty fast relatively speaking).
>>>> 
>>>> Say you wrote a naive benchmark like this:
>>>> 
>>>> long start = System.nanoTime();
>>>> for (int i = 0; i < 1_000_000; i++) {
>>>> runBenchmark();
>>>> }
>>>> long end = System.nanoTime();
>>>> double averageExecutionNanos = (end - start) / 1_000_000.0;
>>>> 
>>>> This will give incorrect results because the first thousand or so
>>>> iterations will be run by the first gen compiler, then the next
>>>> iterations will be run via code output from the second gen compiler
>>>> which will be significantly faster. So then you might split that up
>>>> into a warmup block and a measurement block. You'll still be including
>>>> the execution time spent in garbage collection, thread
>>>> synchronization, JIT compilation, etc., that isn't relevant to the
>>>> benchmark for executing the code over time.
>>>> 
>>>>> On Fri, 5 Jul 2019 at 01:25, Ralph Goers <ralph.goers@dslextreme.com <ma...@dslextreme.com>> wrote:
>>>>> 
>>>>> Well, I can infer it from the examples but it doesn’t come close to explaining some of the things we use it for. I would argue that logging a line whose value changes on every call, even if it is just an incremented integer, that writes to a file is more than a micro-benchmark, and I truly want to know how it is going to perform in real life, not with all the optimizations disabled.  When I first started Log4j 2 I noticed right away how much slower it ran when Java ran in client mode, which was much more common back then. That is essentially what I think JMH is doing with some of these benchmarks. So while it may be useful in determining that one algorithm is better than another when the code has not been optimized I wouldn’t trust that you will see the same behavior when running in a “real” environment.
>>>>> 
>>>>> For example, in testing calculating the location I doubt the JIT compiler can optimize away the call to the native methods involved in grabbing the stack frames. If that is the case then it can’t optimize away the methods to walk the stack frames, although it can certainly “clean them up”. Likewise, it can’t avoid writing log events to files so I don’t see how it can eliminate the work to call the methods to do it, especially if the log line isn’t constant. That said, it isn’t clear to me even after looking at the examples how much JIT stuff JMH is going to allow or prevent. I do see annotations that can be used to prevent inlining but it isn’t clear to me how JMH calls the benchmark methods. All https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html> <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html>> says is that @State will prevent “certain” optimizations from occurring.
>>>>> 
>>>>> Don’t get me wrong. I think there are many places where JHM benchmarks are going to be useful. I’m just not sold on it being as useful for “semi-macro” benchmarks like we have in a lot of cases.
>>>>> 
>>>>> Ralph
>>>>> 
>>>>>> On Jul 4, 2019, at 9:21 PM, Remko Popma <re...@gmail.com> wrote:
>>>>>> 
>>>>>> The JMH samples (https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/) are the best place to learn about the pitfalls of java micro benchmarks and how JMH can help you avoid them.
>>>>>> 
>>>>>> Remko.
>>>>>> 
>>>>>> (Shameless plug) Every java main() method deserves http://picocli.info
>>>>>> 
>>>>>>> On Jul 5, 2019, at 11:25, Ralph Goers <ra...@dslextreme.com> wrote:
>>>>>>> 
>>>>>>> I don’t understand what you mean with you last message.
>>>>>>> 
>>>>>>> I don’t really know the details of how JMH works but my understanding is it does something to prevent things like dead code elimination. I haven’t been able to find anything very specific about what JMH actually does though.
>>>>>>> 
>>>>>>> Ralph
>>>>>>> 
>>>>>>>> On Jul 4, 2019, at 11:15 AM, Matt Sicker <bo...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> And the checkpoint bias thing is alleviated amongst other noise from the
>>>>>>>> JVM that would otherwise invalidate the time measurements.
>>>>>>>> 
>>>>>>>>> On Thu, Jul 4, 2019 at 13:14, Matt Sicker <bo...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> I thought JMH ran your code enough for JIT to take effect (though that’s
>>>>>>>>> also up to your warmup iteration setting), and the
>>>>>>>>> 
>>>>>>>>> On Thu, Jul 4, 2019 at 00:21, Ralph Goers <ra...@dslextreme.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> That said, here are the results of the FileAppenderWithLocationBenchmark
>>>>>>>>>> from master.
>>>>>>>>>> 
>>>>>>>>>> Benchmark                                       Mode  Cnt       Score
>>>>>>>>>> Error  Units
>>>>>>>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   27597.034 ±
>>>>>>>>>> 350.465  ops/s
>>>>>>>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20   82966.450 ±
>>>>>>>>>> 2023.826  ops/s
>>>>>>>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20   78820.010 ±
>>>>>>>>>> 1452.810  ops/s
>>>>>>>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  138656.655 ±
>>>>>>>>>> 6329.118  ops/s
>>>>>>>>>> 
>>>>>>>>>> and here are the results with all the changes I have made.
>>>>>>>>>> 
>>>>>>>>>> Benchmark                                       Mode  Cnt       Score
>>>>>>>>>> Error  Units
>>>>>>>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   47003.919 ±
>>>>>>>>>> 1594.591  ops/s
>>>>>>>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20  185129.606 ±
>>>>>>>>>> 2309.118  ops/s
>>>>>>>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20  183346.456 ±
>>>>>>>>>> 3103.615  ops/s
>>>>>>>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  130121.612 ±
>>>>>>>>>> 2535.387  ops/s
>>>>>>>>>> 
>>>>>>>>>> Ralph
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Jul 3, 2019, at 8:35 PM, Ralph Goers <ra...@dslextreme.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Remko,
>>>>>>>>>>> 
>>>>>>>>>>> I ran the existing JHM benchmarks for both no location and with
>>>>>>>>>> location and there is still a huge difference. The problem is I don’t
>>>>>>>>>> necessarily believe the results. Nor do I think the benchmark I have been
>>>>>>>>>> using is totally accurate either. The problem is that, as I understand it,
>>>>>>>>>> JMH limits or eliminates much of the optimization the Java runtime will
>>>>>>>>>> perform. This is bad for Log4j because in some ways our code is written
>>>>>>>>>> inefficiently and hotspot will optimize it. OTOH, since the test is pretty
>>>>>>>>>> much logging the same record with an incremented value from the same
>>>>>>>>>> location it is hard to say whether the code isn’t being over optimized. I
>>>>>>>>>> suspect the real performance will be somewhere in the middle. What would be
>>>>>>>>>> better is something like a restful service with plenty of logging driven by
>>>>>>>>>> Jmeter once with no location and once with it.
>>>>>>>>>>> 
>>>>>>>>>>> Ralph
>>>>>>>>>>> 
>>>>>>>>>>>> On Jul 2, 2019, at 3:47 PM, Remko Popma <re...@gmail.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> If memory serves correctly, the `includeLocation` was introduced
>>>>>>>>>> because it was necessary for async loggers (and it seemed not feasible to
>>>>>>>>>> get this information from the layout pattern).
>>>>>>>>>>>> 
>>>>>>>>>>>> Async loggers need to know upfront whether they need to pass location
>>>>>>>>>> information to the consumer thread.
>>>>>>>>>>>> 
>>>>>>>>>>>> If obtaining location information becomes so cheap that it’s
>>>>>>>>>> essentially free, it is fine to ignore the `includeLocation` directive and
>>>>>>>>>> include location information anyway.
>>>>>>>>>>>> 
>>>>>>>>>>>> About the performance results: what’s the difference between “Log4j
>>>>>>>>>> 3.0 with class and method” and “Log4j 3.0 withLocation()”?
>>>>>>>>>>>> 
>>>>>>>>>>>> Also I am generally cautious of custom benchmarks; any reason not to
>>>>>>>>>> use JMH?
>>>>>>>>>>>> 
>>>>>>>>>>>> Overall impressive results, congratulations!
>>>>>>>>>>>> 
>>>>>>>>>>>> Remko.
>>>>>>>>>>>> 
>>>>>>>>>>>> (Shameless plug) Every java main() method deserves http://picocli.info
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> But what do you think about includeLocation. Should it override the
>>>>>>>>>> result of requiresLocation or should the fact that we determined location
>>>>>>>>>> information is required override the includeLocation setting?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> These benchmarks are impressive! Moving the point of call site
>>>>>>>>>> calculation is a great idea.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com
>>>>>>>>>> <ma...@dslextreme.com>> wrote:
>>>>>>>>>>>>>> I have made changes to the code the master branch to measure and
>>>>>>>>>> improve performance. I have committed some of the changes but not the last
>>>>>>>>>> set. I have not made the latest changes to release-2.x because they will
>>>>>>>>>> have to be implemented a bit differently to avoid compatibility issues. The
>>>>>>>>>> table below summarizes what I am seeing, at least on my MacBook Pro.  This
>>>>>>>>>> test just basically writes 1.5 million log statements in a loop using an
>>>>>>>>>> equivalent format for each logging framework. Note that JUL is faster with
>>>>>>>>>> location information than without primarily because a custom Handler had to
>>>>>>>>>> be created to format the log record and it performs better than the one
>>>>>>>>>> used to log without location information.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> There are several things I should note about this:
>>>>>>>>>>>>>> • The first step, which is in the PR, has StackLocator.calcLocation
>>>>>>>>>> modified to use a faster algorithm. Streams are not particularly fast. This
>>>>>>>>>> resulted in some gains not shown here but can be seen in
>>>>>>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644 <
>>>>>>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644>
>>>>>>>>>>>>>> • Performance is proportional to the number of stack traces that
>>>>>>>>>> need to be traversed. JUL only had 11 whereas Log4j had 35 before it called
>>>>>>>>>> calcLocation. Our stack is very deep because AbstractLogger and
>>>>>>>>>> LoggerConfig have been overly refactored along with having to go though
>>>>>>>>>> several intermediate objects.
>>>>>>>>>>>>>> • I introduced a requiredLocation() method throughout the stack.
>>>>>>>>>> Any component that requires location information sets this value to true.
>>>>>>>>>> Based on this, the call to calcLocation can be made significantly higher up
>>>>>>>>>> the stack just before the log event is created but only when a configured
>>>>>>>>>> component will use it - with a few exceptions. This code is not yet checked
>>>>>>>>>> in.
>>>>>>>>>>>>>> • If for some reason the location information isn’t calculated when
>>>>>>>>>> the event is constructed it can still be lazily populated later using the
>>>>>>>>>> current, slower mechanism.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> As you can see these changes result in a pretty dramatic
>>>>>>>>>> improvement, almost to the point that the cost of obtaining location
>>>>>>>>>> information doesn’t really matter - especially in Java 11. The results in
>>>>>>>>>> Java 11 amount to 5us per logging call with no location vs 7.5us per
>>>>>>>>>> logging call with location info. These results could probably be improved
>>>>>>>>>> even more if AbstractLogger was “flattened” a bit.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I should point out that the logic isn’t perfect. I cannot determine
>>>>>>>>>> whether location information is required or not for the RoutingAppender
>>>>>>>>>> since it dynamically creates Appenders at run time. Instead, I added a
>>>>>>>>>> “requiresLocation” attribute that can be set to indicate that.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> However, in testing I have encountered a unit test failure in
>>>>>>>>>> RandomAccessFileAppenderTest. It is failing because the location
>>>>>>>>>> information is being included even though includeLocation is set to false.
>>>>>>>>>> I am wondering if that setting needs to be respected since the pattern
>>>>>>>>>> specifies including it. It seems odd to me to not include it in that case.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>> Matt Sicker <bo...@gmail.com>
>>>>>>>>> 
>>>>>>>> --
>>>>>>>> Matt Sicker <bo...@gmail.com>
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
>>> 



Re: Location performance

Posted by Remko Popma <re...@gmail.com>.
This may be of interest:
https://www.researchgate.net/publication/333825812_What's_Wrong_With_My_Benchmark_Results_Studying_Bad_Practices_in_JMH_Benchmarks



> On Jul 6, 2019, at 5:49, Raman Gupta <ro...@gmail.com> wrote:
> 
> I'm no JMH expert... but, there is a lot more going on there than just
> iterations to avoid the extra time for JIT compilation on warmup.
> Checkout https://shipilev.net/talks/devoxx-Nov2013-benchmarking.pdf
> for an idea of the kinds of problems naive timer-only benchmarks
> encounter, and the different ways in which JMH solves them. There are
> other good presentations about JMH at https://shipilev.net/ as well.
> 
> Secondly, JMH isn't de-optimizing the JVM (a la client JVM)
> willy-nilly. It just provides an API that allows you to measure what
> you (hopefully) think you are measuring e.g. blackholes to consume
> results that would otherwise be optimized away by the JVM because the
> result would otherwise be unused, and so forth. AFAIK,  it doesn't
> apply these "de-optimizations" on its own -- it just provides an API
> for you to use as needed.
> 
> Thirdly, I don't believe JMH does anything (by default) to optimize
> away GC or anything like that -- it in fact allows you to consider the
> affect of GC by running lots of iterations, so that the affect of GC
> *is* taken into account in the benchmark. It does have a (non-default)
> option to run a GC before each iteration for specific use cases (e.g.
> benchmarks which generate lots of garbage). It also has a GC profiler
> to understand the impact of GC on your results (see
> https://github.com/Valloric/jmh-playground/blob/master/src/jmh/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java).
> 
> Regards,
> Raman
> 
> 
>> On Fri, Jul 5, 2019 at 4:06 PM Ralph Goers <ra...@dslextreme.com> wrote:
>> 
>> Actually, I would disagree. First, I don’t think the first thousand or slow iterations running slow would matter much with 1.5 million iterations. Lets say the first thousand take 10 times longer the the rest of the iterations. That is still only the equivalent of 10,000 iterations. That is a blip in 1.5 million.
>> Second, measuring the impact of garbage collection should be a part of the benchmark if your code is generating garbage.
>> 
>> Again, I am not saying JMH doesn’t have its place, but I have to think something weird is going on when JMH is showing the no location benchmark to be about 4 times faster than the location benchmark where the simplistic loop test shows the elapsed time when including the location information to be about twice that of not including. I can tell that the call to StackWalker isn’t being optimized away because it still shows up in YourKit as the most expensive operation. And when I turn on immediateFlush flushing the file shows up in YourKit
>> 
>> Ralph
>> 
>>> On Jul 5, 2019, at 12:18 PM, Matt Sicker <bo...@gmail.com> wrote:
>>> 
>>> What I mean is that JMH is written to avoid including
>>> behind-the-scenes variability of the code execution that changes from
>>> startup until the code has been compiled to native code, garbage
>>> collection, synchronization points for threads (safe points), etc. All
>>> these things become variable noise that affects benchmarks when the
>>> code you're testing executes in an especially short period of time
>>> (e.g., appending a log event is pretty fast relatively speaking).
>>> 
>>> Say you wrote a naive benchmark like this:
>>> 
>>> long start = System.nanoTime();
>>> for (int i = 0; i < 1_000_000; i++) {
>>> runBenchmark();
>>> }
>>> long end = System.nanoTime();
>>> double averageExecutionNanos = (end - start) / 1_000_000.0;
>>> 
>>> This will give incorrect results because the first thousand or so
>>> iterations will be run by the first gen compiler, then the next
>>> iterations will be run via code output from the second gen compiler
>>> which will be significantly faster. So then you might split that up
>>> into a warmup block and a measurement block. You'll still be including
>>> the execution time spent in garbage collection, thread
>>> synchronization, JIT compilation, etc., that isn't relevant to the
>>> benchmark for executing the code over time.
>>> 
>>>> On Fri, 5 Jul 2019 at 01:25, Ralph Goers <ralph.goers@dslextreme.com <ma...@dslextreme.com>> wrote:
>>>> 
>>>> Well, I can infer it from the examples but it doesn’t come close to explaining some of the things we use it for. I would argue that logging a line whose value changes on every call, even if it is just an incremented integer, that writes to a file is more than a micro-benchmark, and I truly want to know how it is going to perform in real life, not with all the optimizations disabled.  When I first started Log4j 2 I noticed right away how much slower it ran when Java ran in client mode, which was much more common back then. That is essentially what I think JMH is doing with some of these benchmarks. So while it may be useful in determining that one algorithm is better than another when the code has not been optimized I wouldn’t trust that you will see the same behavior when running in a “real” environment.
>>>> 
>>>> For example, in testing calculating the location I doubt the JIT compiler can optimize away the call to the native methods involved in grabbing the stack frames. If that is the case then it can’t optimize away the methods to walk the stack frames, although it can certainly “clean them up”. Likewise, it can’t avoid writing log events to files so I don’t see how it can eliminate the work to call the methods to do it, especially if the log line isn’t constant. That said, it isn’t clear to me even after looking at the examples how much JIT stuff JMH is going to allow or prevent. I do see annotations that can be used to prevent inlining but it isn’t clear to me how JMH calls the benchmark methods. All https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html> <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html>> says is that @State will prevent “certain” optimizations from occurring.
>>>> 
>>>> Don’t get me wrong. I think there are many places where JHM benchmarks are going to be useful. I’m just not sold on it being as useful for “semi-macro” benchmarks like we have in a lot of cases.
>>>> 
>>>> Ralph
>>>> 
>>>>> On Jul 4, 2019, at 9:21 PM, Remko Popma <re...@gmail.com> wrote:
>>>>> 
>>>>> The JMH samples (https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/) are the best place to learn about the pitfalls of java micro benchmarks and how JMH can help you avoid them.
>>>>> 
>>>>> Remko.
>>>>> 
>>>>> (Shameless plug) Every java main() method deserves http://picocli.info
>>>>> 
>>>>>> On Jul 5, 2019, at 11:25, Ralph Goers <ra...@dslextreme.com> wrote:
>>>>>> 
>>>>>> I don’t understand what you mean with you last message.
>>>>>> 
>>>>>> I don’t really know the details of how JMH works but my understanding is it does something to prevent things like dead code elimination. I haven’t been able to find anything very specific about what JMH actually does though.
>>>>>> 
>>>>>> Ralph
>>>>>> 
>>>>>>> On Jul 4, 2019, at 11:15 AM, Matt Sicker <bo...@gmail.com> wrote:
>>>>>>> 
>>>>>>> And the checkpoint bias thing is alleviated amongst other noise from the
>>>>>>> JVM that would otherwise invalidate the time measurements.
>>>>>>> 
>>>>>>>> On Thu, Jul 4, 2019 at 13:14, Matt Sicker <bo...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> I thought JMH ran your code enough for JIT to take effect (though that’s
>>>>>>>> also up to your warmup iteration setting), and the
>>>>>>>> 
>>>>>>>> On Thu, Jul 4, 2019 at 00:21, Ralph Goers <ra...@dslextreme.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> That said, here are the results of the FileAppenderWithLocationBenchmark
>>>>>>>>> from master.
>>>>>>>>> 
>>>>>>>>> Benchmark                                       Mode  Cnt       Score
>>>>>>>>> Error  Units
>>>>>>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   27597.034 ±
>>>>>>>>> 350.465  ops/s
>>>>>>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20   82966.450 ±
>>>>>>>>> 2023.826  ops/s
>>>>>>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20   78820.010 ±
>>>>>>>>> 1452.810  ops/s
>>>>>>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  138656.655 ±
>>>>>>>>> 6329.118  ops/s
>>>>>>>>> 
>>>>>>>>> and here are the results with all the changes I have made.
>>>>>>>>> 
>>>>>>>>> Benchmark                                       Mode  Cnt       Score
>>>>>>>>> Error  Units
>>>>>>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   47003.919 ±
>>>>>>>>> 1594.591  ops/s
>>>>>>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20  185129.606 ±
>>>>>>>>> 2309.118  ops/s
>>>>>>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20  183346.456 ±
>>>>>>>>> 3103.615  ops/s
>>>>>>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  130121.612 ±
>>>>>>>>> 2535.387  ops/s
>>>>>>>>> 
>>>>>>>>> Ralph
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Jul 3, 2019, at 8:35 PM, Ralph Goers <ra...@dslextreme.com>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Remko,
>>>>>>>>>> 
>>>>>>>>>> I ran the existing JHM benchmarks for both no location and with
>>>>>>>>> location and there is still a huge difference. The problem is I don’t
>>>>>>>>> necessarily believe the results. Nor do I think the benchmark I have been
>>>>>>>>> using is totally accurate either. The problem is that, as I understand it,
>>>>>>>>> JMH limits or eliminates much of the optimization the Java runtime will
>>>>>>>>> perform. This is bad for Log4j because in some ways our code is written
>>>>>>>>> inefficiently and hotspot will optimize it. OTOH, since the test is pretty
>>>>>>>>> much logging the same record with an incremented value from the same
>>>>>>>>> location it is hard to say whether the code isn’t being over optimized. I
>>>>>>>>> suspect the real performance will be somewhere in the middle. What would be
>>>>>>>>> better is something like a restful service with plenty of logging driven by
>>>>>>>>> Jmeter once with no location and once with it.
>>>>>>>>>> 
>>>>>>>>>> Ralph
>>>>>>>>>> 
>>>>>>>>>>> On Jul 2, 2019, at 3:47 PM, Remko Popma <re...@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> If memory serves correctly, the `includeLocation` was introduced
>>>>>>>>> because it was necessary for async loggers (and it seemed not feasible to
>>>>>>>>> get this information from the layout pattern).
>>>>>>>>>>> 
>>>>>>>>>>> Async loggers need to know upfront whether they need to pass location
>>>>>>>>> information to the consumer thread.
>>>>>>>>>>> 
>>>>>>>>>>> If obtaining location information becomes so cheap that it’s
>>>>>>>>> essentially free, it is fine to ignore the `includeLocation` directive and
>>>>>>>>> include location information anyway.
>>>>>>>>>>> 
>>>>>>>>>>> About the performance results: what’s the difference between “Log4j
>>>>>>>>> 3.0 with class and method” and “Log4j 3.0 withLocation()”?
>>>>>>>>>>> 
>>>>>>>>>>> Also I am generally cautious of custom benchmarks; any reason not to
>>>>>>>>> use JMH?
>>>>>>>>>>> 
>>>>>>>>>>> Overall impressive results, congratulations!
>>>>>>>>>>> 
>>>>>>>>>>> Remko.
>>>>>>>>>>> 
>>>>>>>>>>> (Shameless plug) Every java main() method deserves http://picocli.info
>>>>>>>>>>> 
>>>>>>>>>>>> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com>
>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> But what do you think about includeLocation. Should it override the
>>>>>>>>> result of requiresLocation or should the fact that we determined location
>>>>>>>>> information is required override the includeLocation setting?
>>>>>>>>>>>> 
>>>>>>>>>>>> Ralph
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> These benchmarks are impressive! Moving the point of call site
>>>>>>>>> calculation is a great idea.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com
>>>>>>>>> <ma...@dslextreme.com>> wrote:
>>>>>>>>>>>>> I have made changes to the code the master branch to measure and
>>>>>>>>> improve performance. I have committed some of the changes but not the last
>>>>>>>>> set. I have not made the latest changes to release-2.x because they will
>>>>>>>>> have to be implemented a bit differently to avoid compatibility issues. The
>>>>>>>>> table below summarizes what I am seeing, at least on my MacBook Pro.  This
>>>>>>>>> test just basically writes 1.5 million log statements in a loop using an
>>>>>>>>> equivalent format for each logging framework. Note that JUL is faster with
>>>>>>>>> location information than without primarily because a custom Handler had to
>>>>>>>>> be created to format the log record and it performs better than the one
>>>>>>>>> used to log without location information.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> There are several things I should note about this:
>>>>>>>>>>>>> • The first step, which is in the PR, has StackLocator.calcLocation
>>>>>>>>> modified to use a faster algorithm. Streams are not particularly fast. This
>>>>>>>>> resulted in some gains not shown here but can be seen in
>>>>>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644 <
>>>>>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644>
>>>>>>>>>>>>> • Performance is proportional to the number of stack traces that
>>>>>>>>> need to be traversed. JUL only had 11 whereas Log4j had 35 before it called
>>>>>>>>> calcLocation. Our stack is very deep because AbstractLogger and
>>>>>>>>> LoggerConfig have been overly refactored along with having to go though
>>>>>>>>> several intermediate objects.
>>>>>>>>>>>>> • I introduced a requiredLocation() method throughout the stack.
>>>>>>>>> Any component that requires location information sets this value to true.
>>>>>>>>> Based on this, the call to calcLocation can be made significantly higher up
>>>>>>>>> the stack just before the log event is created but only when a configured
>>>>>>>>> component will use it - with a few exceptions. This code is not yet checked
>>>>>>>>> in.
>>>>>>>>>>>>> • If for some reason the location information isn’t calculated when
>>>>>>>>> the event is constructed it can still be lazily populated later using the
>>>>>>>>> current, slower mechanism.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> As you can see these changes result in a pretty dramatic
>>>>>>>>> improvement, almost to the point that the cost of obtaining location
>>>>>>>>> information doesn’t really matter - especially in Java 11. The results in
>>>>>>>>> Java 11 amount to 5us per logging call with no location vs 7.5us per
>>>>>>>>> logging call with location info. These results could probably be improved
>>>>>>>>> even more if AbstractLogger was “flattened” a bit.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I should point out that the logic isn’t perfect. I cannot determine
>>>>>>>>> whether location information is required or not for the RoutingAppender
>>>>>>>>> since it dynamically creates Appenders at run time. Instead, I added a
>>>>>>>>> “requiresLocation” attribute that can be set to indicate that.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> However, in testing I have encountered a unit test failure in
>>>>>>>>> RandomAccessFileAppenderTest. It is failing because the location
>>>>>>>>> information is being included even though includeLocation is set to false.
>>>>>>>>> I am wondering if that setting needs to be respected since the pattern
>>>>>>>>> specifies including it. It seems odd to me to not include it in that case.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>> Matt Sicker <bo...@gmail.com>
>>>>>>>> 
>>>>>>> --
>>>>>>> Matt Sicker <bo...@gmail.com>
>>>>>> 
>>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
>> 

Re: Location performance

Posted by Raman Gupta <ro...@gmail.com>.
I'm no JMH expert... but, there is a lot more going on there than just
iterations to avoid the extra time for JIT compilation on warmup.
Checkout https://shipilev.net/talks/devoxx-Nov2013-benchmarking.pdf
for an idea of the kinds of problems naive timer-only benchmarks
encounter, and the different ways in which JMH solves them. There are
other good presentations about JMH at https://shipilev.net/ as well.

Secondly, JMH isn't de-optimizing the JVM (a la client JVM)
willy-nilly. It just provides an API that allows you to measure what
you (hopefully) think you are measuring e.g. blackholes to consume
results that would otherwise be optimized away by the JVM because the
result would otherwise be unused, and so forth. AFAIK,  it doesn't
apply these "de-optimizations" on its own -- it just provides an API
for you to use as needed.

Thirdly, I don't believe JMH does anything (by default) to optimize
away GC or anything like that -- it in fact allows you to consider the
affect of GC by running lots of iterations, so that the affect of GC
*is* taken into account in the benchmark. It does have a (non-default)
option to run a GC before each iteration for specific use cases (e.g.
benchmarks which generate lots of garbage). It also has a GC profiler
to understand the impact of GC on your results (see
https://github.com/Valloric/jmh-playground/blob/master/src/jmh/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java).

Regards,
Raman


On Fri, Jul 5, 2019 at 4:06 PM Ralph Goers <ra...@dslextreme.com> wrote:
>
> Actually, I would disagree. First, I don’t think the first thousand or slow iterations running slow would matter much with 1.5 million iterations. Lets say the first thousand take 10 times longer the the rest of the iterations. That is still only the equivalent of 10,000 iterations. That is a blip in 1.5 million.
> Second, measuring the impact of garbage collection should be a part of the benchmark if your code is generating garbage.
>
> Again, I am not saying JMH doesn’t have its place, but I have to think something weird is going on when JMH is showing the no location benchmark to be about 4 times faster than the location benchmark where the simplistic loop test shows the elapsed time when including the location information to be about twice that of not including. I can tell that the call to StackWalker isn’t being optimized away because it still shows up in YourKit as the most expensive operation. And when I turn on immediateFlush flushing the file shows up in YourKit
>
> Ralph
>
> > On Jul 5, 2019, at 12:18 PM, Matt Sicker <bo...@gmail.com> wrote:
> >
> > What I mean is that JMH is written to avoid including
> > behind-the-scenes variability of the code execution that changes from
> > startup until the code has been compiled to native code, garbage
> > collection, synchronization points for threads (safe points), etc. All
> > these things become variable noise that affects benchmarks when the
> > code you're testing executes in an especially short period of time
> > (e.g., appending a log event is pretty fast relatively speaking).
> >
> > Say you wrote a naive benchmark like this:
> >
> > long start = System.nanoTime();
> > for (int i = 0; i < 1_000_000; i++) {
> >  runBenchmark();
> > }
> > long end = System.nanoTime();
> > double averageExecutionNanos = (end - start) / 1_000_000.0;
> >
> > This will give incorrect results because the first thousand or so
> > iterations will be run by the first gen compiler, then the next
> > iterations will be run via code output from the second gen compiler
> > which will be significantly faster. So then you might split that up
> > into a warmup block and a measurement block. You'll still be including
> > the execution time spent in garbage collection, thread
> > synchronization, JIT compilation, etc., that isn't relevant to the
> > benchmark for executing the code over time.
> >
> > On Fri, 5 Jul 2019 at 01:25, Ralph Goers <ralph.goers@dslextreme.com <ma...@dslextreme.com>> wrote:
> >>
> >> Well, I can infer it from the examples but it doesn’t come close to explaining some of the things we use it for. I would argue that logging a line whose value changes on every call, even if it is just an incremented integer, that writes to a file is more than a micro-benchmark, and I truly want to know how it is going to perform in real life, not with all the optimizations disabled.  When I first started Log4j 2 I noticed right away how much slower it ran when Java ran in client mode, which was much more common back then. That is essentially what I think JMH is doing with some of these benchmarks. So while it may be useful in determining that one algorithm is better than another when the code has not been optimized I wouldn’t trust that you will see the same behavior when running in a “real” environment.
> >>
> >> For example, in testing calculating the location I doubt the JIT compiler can optimize away the call to the native methods involved in grabbing the stack frames. If that is the case then it can’t optimize away the methods to walk the stack frames, although it can certainly “clean them up”. Likewise, it can’t avoid writing log events to files so I don’t see how it can eliminate the work to call the methods to do it, especially if the log line isn’t constant. That said, it isn’t clear to me even after looking at the examples how much JIT stuff JMH is going to allow or prevent. I do see annotations that can be used to prevent inlining but it isn’t clear to me how JMH calls the benchmark methods. All https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html> <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html>> says is that @State will prevent “certain” optimizations from occurring.
> >>
> >> Don’t get me wrong. I think there are many places where JHM benchmarks are going to be useful. I’m just not sold on it being as useful for “semi-macro” benchmarks like we have in a lot of cases.
> >>
> >> Ralph
> >>
> >>> On Jul 4, 2019, at 9:21 PM, Remko Popma <re...@gmail.com> wrote:
> >>>
> >>> The JMH samples (https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/) are the best place to learn about the pitfalls of java micro benchmarks and how JMH can help you avoid them.
> >>>
> >>> Remko.
> >>>
> >>> (Shameless plug) Every java main() method deserves http://picocli.info
> >>>
> >>>> On Jul 5, 2019, at 11:25, Ralph Goers <ra...@dslextreme.com> wrote:
> >>>>
> >>>> I don’t understand what you mean with you last message.
> >>>>
> >>>> I don’t really know the details of how JMH works but my understanding is it does something to prevent things like dead code elimination. I haven’t been able to find anything very specific about what JMH actually does though.
> >>>>
> >>>> Ralph
> >>>>
> >>>>> On Jul 4, 2019, at 11:15 AM, Matt Sicker <bo...@gmail.com> wrote:
> >>>>>
> >>>>> And the checkpoint bias thing is alleviated amongst other noise from the
> >>>>> JVM that would otherwise invalidate the time measurements.
> >>>>>
> >>>>>> On Thu, Jul 4, 2019 at 13:14, Matt Sicker <bo...@gmail.com> wrote:
> >>>>>>
> >>>>>> I thought JMH ran your code enough for JIT to take effect (though that’s
> >>>>>> also up to your warmup iteration setting), and the
> >>>>>>
> >>>>>> On Thu, Jul 4, 2019 at 00:21, Ralph Goers <ra...@dslextreme.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> That said, here are the results of the FileAppenderWithLocationBenchmark
> >>>>>>> from master.
> >>>>>>>
> >>>>>>> Benchmark                                       Mode  Cnt       Score
> >>>>>>> Error  Units
> >>>>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   27597.034 ±
> >>>>>>> 350.465  ops/s
> >>>>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20   82966.450 ±
> >>>>>>> 2023.826  ops/s
> >>>>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20   78820.010 ±
> >>>>>>> 1452.810  ops/s
> >>>>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  138656.655 ±
> >>>>>>> 6329.118  ops/s
> >>>>>>>
> >>>>>>> and here are the results with all the changes I have made.
> >>>>>>>
> >>>>>>> Benchmark                                       Mode  Cnt       Score
> >>>>>>> Error  Units
> >>>>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   47003.919 ±
> >>>>>>> 1594.591  ops/s
> >>>>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20  185129.606 ±
> >>>>>>> 2309.118  ops/s
> >>>>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20  183346.456 ±
> >>>>>>> 3103.615  ops/s
> >>>>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  130121.612 ±
> >>>>>>> 2535.387  ops/s
> >>>>>>>
> >>>>>>> Ralph
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Jul 3, 2019, at 8:35 PM, Ralph Goers <ra...@dslextreme.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Remko,
> >>>>>>>>
> >>>>>>>> I ran the existing JHM benchmarks for both no location and with
> >>>>>>> location and there is still a huge difference. The problem is I don’t
> >>>>>>> necessarily believe the results. Nor do I think the benchmark I have been
> >>>>>>> using is totally accurate either. The problem is that, as I understand it,
> >>>>>>> JMH limits or eliminates much of the optimization the Java runtime will
> >>>>>>> perform. This is bad for Log4j because in some ways our code is written
> >>>>>>> inefficiently and hotspot will optimize it. OTOH, since the test is pretty
> >>>>>>> much logging the same record with an incremented value from the same
> >>>>>>> location it is hard to say whether the code isn’t being over optimized. I
> >>>>>>> suspect the real performance will be somewhere in the middle. What would be
> >>>>>>> better is something like a restful service with plenty of logging driven by
> >>>>>>> Jmeter once with no location and once with it.
> >>>>>>>>
> >>>>>>>> Ralph
> >>>>>>>>
> >>>>>>>>> On Jul 2, 2019, at 3:47 PM, Remko Popma <re...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> If memory serves correctly, the `includeLocation` was introduced
> >>>>>>> because it was necessary for async loggers (and it seemed not feasible to
> >>>>>>> get this information from the layout pattern).
> >>>>>>>>>
> >>>>>>>>> Async loggers need to know upfront whether they need to pass location
> >>>>>>> information to the consumer thread.
> >>>>>>>>>
> >>>>>>>>> If obtaining location information becomes so cheap that it’s
> >>>>>>> essentially free, it is fine to ignore the `includeLocation` directive and
> >>>>>>> include location information anyway.
> >>>>>>>>>
> >>>>>>>>> About the performance results: what’s the difference between “Log4j
> >>>>>>> 3.0 with class and method” and “Log4j 3.0 withLocation()”?
> >>>>>>>>>
> >>>>>>>>> Also I am generally cautious of custom benchmarks; any reason not to
> >>>>>>> use JMH?
> >>>>>>>>>
> >>>>>>>>> Overall impressive results, congratulations!
> >>>>>>>>>
> >>>>>>>>> Remko.
> >>>>>>>>>
> >>>>>>>>> (Shameless plug) Every java main() method deserves http://picocli.info
> >>>>>>>>>
> >>>>>>>>>> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com>
> >>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> But what do you think about includeLocation. Should it override the
> >>>>>>> result of requiresLocation or should the fact that we determined location
> >>>>>>> information is required override the includeLocation setting?
> >>>>>>>>>>
> >>>>>>>>>> Ralph
> >>>>>>>>>>
> >>>>>>>>>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> These benchmarks are impressive! Moving the point of call site
> >>>>>>> calculation is a great idea.
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com
> >>>>>>> <ma...@dslextreme.com>> wrote:
> >>>>>>>>>>> I have made changes to the code the master branch to measure and
> >>>>>>> improve performance. I have committed some of the changes but not the last
> >>>>>>> set. I have not made the latest changes to release-2.x because they will
> >>>>>>> have to be implemented a bit differently to avoid compatibility issues. The
> >>>>>>> table below summarizes what I am seeing, at least on my MacBook Pro.  This
> >>>>>>> test just basically writes 1.5 million log statements in a loop using an
> >>>>>>> equivalent format for each logging framework. Note that JUL is faster with
> >>>>>>> location information than without primarily because a custom Handler had to
> >>>>>>> be created to format the log record and it performs better than the one
> >>>>>>> used to log without location information.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> There are several things I should note about this:
> >>>>>>>>>>> • The first step, which is in the PR, has StackLocator.calcLocation
> >>>>>>> modified to use a faster algorithm. Streams are not particularly fast. This
> >>>>>>> resulted in some gains not shown here but can be seen in
> >>>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644 <
> >>>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644>
> >>>>>>>>>>> • Performance is proportional to the number of stack traces that
> >>>>>>> need to be traversed. JUL only had 11 whereas Log4j had 35 before it called
> >>>>>>> calcLocation. Our stack is very deep because AbstractLogger and
> >>>>>>> LoggerConfig have been overly refactored along with having to go though
> >>>>>>> several intermediate objects.
> >>>>>>>>>>> • I introduced a requiredLocation() method throughout the stack.
> >>>>>>> Any component that requires location information sets this value to true.
> >>>>>>> Based on this, the call to calcLocation can be made significantly higher up
> >>>>>>> the stack just before the log event is created but only when a configured
> >>>>>>> component will use it - with a few exceptions. This code is not yet checked
> >>>>>>> in.
> >>>>>>>>>>> • If for some reason the location information isn’t calculated when
> >>>>>>> the event is constructed it can still be lazily populated later using the
> >>>>>>> current, slower mechanism.
> >>>>>>>>>>>
> >>>>>>>>>>> As you can see these changes result in a pretty dramatic
> >>>>>>> improvement, almost to the point that the cost of obtaining location
> >>>>>>> information doesn’t really matter - especially in Java 11. The results in
> >>>>>>> Java 11 amount to 5us per logging call with no location vs 7.5us per
> >>>>>>> logging call with location info. These results could probably be improved
> >>>>>>> even more if AbstractLogger was “flattened” a bit.
> >>>>>>>>>>>
> >>>>>>>>>>> I should point out that the logic isn’t perfect. I cannot determine
> >>>>>>> whether location information is required or not for the RoutingAppender
> >>>>>>> since it dynamically creates Appenders at run time. Instead, I added a
> >>>>>>> “requiresLocation” attribute that can be set to indicate that.
> >>>>>>>>>>>
> >>>>>>>>>>> However, in testing I have encountered a unit test failure in
> >>>>>>> RandomAccessFileAppenderTest. It is failing because the location
> >>>>>>> information is being included even though includeLocation is set to false.
> >>>>>>> I am wondering if that setting needs to be respected since the pattern
> >>>>>>> specifies including it. It seems odd to me to not include it in that case.
> >>>>>>>>>>>
> >>>>>>>>>>> Thoughts?
> >>>>>>>>>>>
> >>>>>>>>>>> Ralph
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>> Matt Sicker <bo...@gmail.com>
> >>>>>>
> >>>>> --
> >>>>> Matt Sicker <bo...@gmail.com>
> >>>>
> >>>>
> >>
> >
> >
> > --
> > Matt Sicker <boards@gmail.com <ma...@gmail.com>>
>

Re: Location performance

Posted by Ralph Goers <ra...@dslextreme.com>.
Actually, I would disagree. First, I don’t think the first thousand or slow iterations running slow would matter much with 1.5 million iterations. Lets say the first thousand take 10 times longer the the rest of the iterations. That is still only the equivalent of 10,000 iterations. That is a blip in 1.5 million.
Second, measuring the impact of garbage collection should be a part of the benchmark if your code is generating garbage. 

Again, I am not saying JMH doesn’t have its place, but I have to think something weird is going on when JMH is showing the no location benchmark to be about 4 times faster than the location benchmark where the simplistic loop test shows the elapsed time when including the location information to be about twice that of not including. I can tell that the call to StackWalker isn’t being optimized away because it still shows up in YourKit as the most expensive operation. And when I turn on immediateFlush flushing the file shows up in YourKit 

Ralph

> On Jul 5, 2019, at 12:18 PM, Matt Sicker <bo...@gmail.com> wrote:
> 
> What I mean is that JMH is written to avoid including
> behind-the-scenes variability of the code execution that changes from
> startup until the code has been compiled to native code, garbage
> collection, synchronization points for threads (safe points), etc. All
> these things become variable noise that affects benchmarks when the
> code you're testing executes in an especially short period of time
> (e.g., appending a log event is pretty fast relatively speaking).
> 
> Say you wrote a naive benchmark like this:
> 
> long start = System.nanoTime();
> for (int i = 0; i < 1_000_000; i++) {
>  runBenchmark();
> }
> long end = System.nanoTime();
> double averageExecutionNanos = (end - start) / 1_000_000.0;
> 
> This will give incorrect results because the first thousand or so
> iterations will be run by the first gen compiler, then the next
> iterations will be run via code output from the second gen compiler
> which will be significantly faster. So then you might split that up
> into a warmup block and a measurement block. You'll still be including
> the execution time spent in garbage collection, thread
> synchronization, JIT compilation, etc., that isn't relevant to the
> benchmark for executing the code over time.
> 
> On Fri, 5 Jul 2019 at 01:25, Ralph Goers <ralph.goers@dslextreme.com <ma...@dslextreme.com>> wrote:
>> 
>> Well, I can infer it from the examples but it doesn’t come close to explaining some of the things we use it for. I would argue that logging a line whose value changes on every call, even if it is just an incremented integer, that writes to a file is more than a micro-benchmark, and I truly want to know how it is going to perform in real life, not with all the optimizations disabled.  When I first started Log4j 2 I noticed right away how much slower it ran when Java ran in client mode, which was much more common back then. That is essentially what I think JMH is doing with some of these benchmarks. So while it may be useful in determining that one algorithm is better than another when the code has not been optimized I wouldn’t trust that you will see the same behavior when running in a “real” environment.
>> 
>> For example, in testing calculating the location I doubt the JIT compiler can optimize away the call to the native methods involved in grabbing the stack frames. If that is the case then it can’t optimize away the methods to walk the stack frames, although it can certainly “clean them up”. Likewise, it can’t avoid writing log events to files so I don’t see how it can eliminate the work to call the methods to do it, especially if the log line isn’t constant. That said, it isn’t clear to me even after looking at the examples how much JIT stuff JMH is going to allow or prevent. I do see annotations that can be used to prevent inlining but it isn’t clear to me how JMH calls the benchmark methods. All https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html> <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html>> says is that @State will prevent “certain” optimizations from occurring.
>> 
>> Don’t get me wrong. I think there are many places where JHM benchmarks are going to be useful. I’m just not sold on it being as useful for “semi-macro” benchmarks like we have in a lot of cases.
>> 
>> Ralph
>> 
>>> On Jul 4, 2019, at 9:21 PM, Remko Popma <re...@gmail.com> wrote:
>>> 
>>> The JMH samples (https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/) are the best place to learn about the pitfalls of java micro benchmarks and how JMH can help you avoid them.
>>> 
>>> Remko.
>>> 
>>> (Shameless plug) Every java main() method deserves http://picocli.info
>>> 
>>>> On Jul 5, 2019, at 11:25, Ralph Goers <ra...@dslextreme.com> wrote:
>>>> 
>>>> I don’t understand what you mean with you last message.
>>>> 
>>>> I don’t really know the details of how JMH works but my understanding is it does something to prevent things like dead code elimination. I haven’t been able to find anything very specific about what JMH actually does though.
>>>> 
>>>> Ralph
>>>> 
>>>>> On Jul 4, 2019, at 11:15 AM, Matt Sicker <bo...@gmail.com> wrote:
>>>>> 
>>>>> And the checkpoint bias thing is alleviated amongst other noise from the
>>>>> JVM that would otherwise invalidate the time measurements.
>>>>> 
>>>>>> On Thu, Jul 4, 2019 at 13:14, Matt Sicker <bo...@gmail.com> wrote:
>>>>>> 
>>>>>> I thought JMH ran your code enough for JIT to take effect (though that’s
>>>>>> also up to your warmup iteration setting), and the
>>>>>> 
>>>>>> On Thu, Jul 4, 2019 at 00:21, Ralph Goers <ra...@dslextreme.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> That said, here are the results of the FileAppenderWithLocationBenchmark
>>>>>>> from master.
>>>>>>> 
>>>>>>> Benchmark                                       Mode  Cnt       Score
>>>>>>> Error  Units
>>>>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   27597.034 ±
>>>>>>> 350.465  ops/s
>>>>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20   82966.450 ±
>>>>>>> 2023.826  ops/s
>>>>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20   78820.010 ±
>>>>>>> 1452.810  ops/s
>>>>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  138656.655 ±
>>>>>>> 6329.118  ops/s
>>>>>>> 
>>>>>>> and here are the results with all the changes I have made.
>>>>>>> 
>>>>>>> Benchmark                                       Mode  Cnt       Score
>>>>>>> Error  Units
>>>>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   47003.919 ±
>>>>>>> 1594.591  ops/s
>>>>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20  185129.606 ±
>>>>>>> 2309.118  ops/s
>>>>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20  183346.456 ±
>>>>>>> 3103.615  ops/s
>>>>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  130121.612 ±
>>>>>>> 2535.387  ops/s
>>>>>>> 
>>>>>>> Ralph
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jul 3, 2019, at 8:35 PM, Ralph Goers <ra...@dslextreme.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Remko,
>>>>>>>> 
>>>>>>>> I ran the existing JHM benchmarks for both no location and with
>>>>>>> location and there is still a huge difference. The problem is I don’t
>>>>>>> necessarily believe the results. Nor do I think the benchmark I have been
>>>>>>> using is totally accurate either. The problem is that, as I understand it,
>>>>>>> JMH limits or eliminates much of the optimization the Java runtime will
>>>>>>> perform. This is bad for Log4j because in some ways our code is written
>>>>>>> inefficiently and hotspot will optimize it. OTOH, since the test is pretty
>>>>>>> much logging the same record with an incremented value from the same
>>>>>>> location it is hard to say whether the code isn’t being over optimized. I
>>>>>>> suspect the real performance will be somewhere in the middle. What would be
>>>>>>> better is something like a restful service with plenty of logging driven by
>>>>>>> Jmeter once with no location and once with it.
>>>>>>>> 
>>>>>>>> Ralph
>>>>>>>> 
>>>>>>>>> On Jul 2, 2019, at 3:47 PM, Remko Popma <re...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> If memory serves correctly, the `includeLocation` was introduced
>>>>>>> because it was necessary for async loggers (and it seemed not feasible to
>>>>>>> get this information from the layout pattern).
>>>>>>>>> 
>>>>>>>>> Async loggers need to know upfront whether they need to pass location
>>>>>>> information to the consumer thread.
>>>>>>>>> 
>>>>>>>>> If obtaining location information becomes so cheap that it’s
>>>>>>> essentially free, it is fine to ignore the `includeLocation` directive and
>>>>>>> include location information anyway.
>>>>>>>>> 
>>>>>>>>> About the performance results: what’s the difference between “Log4j
>>>>>>> 3.0 with class and method” and “Log4j 3.0 withLocation()”?
>>>>>>>>> 
>>>>>>>>> Also I am generally cautious of custom benchmarks; any reason not to
>>>>>>> use JMH?
>>>>>>>>> 
>>>>>>>>> Overall impressive results, congratulations!
>>>>>>>>> 
>>>>>>>>> Remko.
>>>>>>>>> 
>>>>>>>>> (Shameless plug) Every java main() method deserves http://picocli.info
>>>>>>>>> 
>>>>>>>>>> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com>
>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> But what do you think about includeLocation. Should it override the
>>>>>>> result of requiresLocation or should the fact that we determined location
>>>>>>> information is required override the includeLocation setting?
>>>>>>>>>> 
>>>>>>>>>> Ralph
>>>>>>>>>> 
>>>>>>>>>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> These benchmarks are impressive! Moving the point of call site
>>>>>>> calculation is a great idea.
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com
>>>>>>> <ma...@dslextreme.com>> wrote:
>>>>>>>>>>> I have made changes to the code the master branch to measure and
>>>>>>> improve performance. I have committed some of the changes but not the last
>>>>>>> set. I have not made the latest changes to release-2.x because they will
>>>>>>> have to be implemented a bit differently to avoid compatibility issues. The
>>>>>>> table below summarizes what I am seeing, at least on my MacBook Pro.  This
>>>>>>> test just basically writes 1.5 million log statements in a loop using an
>>>>>>> equivalent format for each logging framework. Note that JUL is faster with
>>>>>>> location information than without primarily because a custom Handler had to
>>>>>>> be created to format the log record and it performs better than the one
>>>>>>> used to log without location information.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> There are several things I should note about this:
>>>>>>>>>>> • The first step, which is in the PR, has StackLocator.calcLocation
>>>>>>> modified to use a faster algorithm. Streams are not particularly fast. This
>>>>>>> resulted in some gains not shown here but can be seen in
>>>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644 <
>>>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644>
>>>>>>>>>>> • Performance is proportional to the number of stack traces that
>>>>>>> need to be traversed. JUL only had 11 whereas Log4j had 35 before it called
>>>>>>> calcLocation. Our stack is very deep because AbstractLogger and
>>>>>>> LoggerConfig have been overly refactored along with having to go though
>>>>>>> several intermediate objects.
>>>>>>>>>>> • I introduced a requiredLocation() method throughout the stack.
>>>>>>> Any component that requires location information sets this value to true.
>>>>>>> Based on this, the call to calcLocation can be made significantly higher up
>>>>>>> the stack just before the log event is created but only when a configured
>>>>>>> component will use it - with a few exceptions. This code is not yet checked
>>>>>>> in.
>>>>>>>>>>> • If for some reason the location information isn’t calculated when
>>>>>>> the event is constructed it can still be lazily populated later using the
>>>>>>> current, slower mechanism.
>>>>>>>>>>> 
>>>>>>>>>>> As you can see these changes result in a pretty dramatic
>>>>>>> improvement, almost to the point that the cost of obtaining location
>>>>>>> information doesn’t really matter - especially in Java 11. The results in
>>>>>>> Java 11 amount to 5us per logging call with no location vs 7.5us per
>>>>>>> logging call with location info. These results could probably be improved
>>>>>>> even more if AbstractLogger was “flattened” a bit.
>>>>>>>>>>> 
>>>>>>>>>>> I should point out that the logic isn’t perfect. I cannot determine
>>>>>>> whether location information is required or not for the RoutingAppender
>>>>>>> since it dynamically creates Appenders at run time. Instead, I added a
>>>>>>> “requiresLocation” attribute that can be set to indicate that.
>>>>>>>>>>> 
>>>>>>>>>>> However, in testing I have encountered a unit test failure in
>>>>>>> RandomAccessFileAppenderTest. It is failing because the location
>>>>>>> information is being included even though includeLocation is set to false.
>>>>>>> I am wondering if that setting needs to be respected since the pattern
>>>>>>> specifies including it. It seems odd to me to not include it in that case.
>>>>>>>>>>> 
>>>>>>>>>>> Thoughts?
>>>>>>>>>>> 
>>>>>>>>>>> Ralph
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>> Matt Sicker <bo...@gmail.com>
>>>>>> 
>>>>> --
>>>>> Matt Sicker <bo...@gmail.com>
>>>> 
>>>> 
>> 
> 
> 
> -- 
> Matt Sicker <boards@gmail.com <ma...@gmail.com>>


Re: Location performance

Posted by Matt Sicker <bo...@gmail.com>.
What I mean is that JMH is written to avoid including
behind-the-scenes variability of the code execution that changes from
startup until the code has been compiled to native code, garbage
collection, synchronization points for threads (safe points), etc. All
these things become variable noise that affects benchmarks when the
code you're testing executes in an especially short period of time
(e.g., appending a log event is pretty fast relatively speaking).

Say you wrote a naive benchmark like this:

long start = System.nanoTime();
for (int i = 0; i < 1_000_000; i++) {
  runBenchmark();
}
long end = System.nanoTime();
double averageExecutionNanos = (end - start) / 1_000_000.0;

This will give incorrect results because the first thousand or so
iterations will be run by the first gen compiler, then the next
iterations will be run via code output from the second gen compiler
which will be significantly faster. So then you might split that up
into a warmup block and a measurement block. You'll still be including
the execution time spent in garbage collection, thread
synchronization, JIT compilation, etc., that isn't relevant to the
benchmark for executing the code over time.

On Fri, 5 Jul 2019 at 01:25, Ralph Goers <ra...@dslextreme.com> wrote:
>
> Well, I can infer it from the examples but it doesn’t come close to explaining some of the things we use it for. I would argue that logging a line whose value changes on every call, even if it is just an incremented integer, that writes to a file is more than a micro-benchmark, and I truly want to know how it is going to perform in real life, not with all the optimizations disabled.  When I first started Log4j 2 I noticed right away how much slower it ran when Java ran in client mode, which was much more common back then. That is essentially what I think JMH is doing with some of these benchmarks. So while it may be useful in determining that one algorithm is better than another when the code has not been optimized I wouldn’t trust that you will see the same behavior when running in a “real” environment.
>
> For example, in testing calculating the location I doubt the JIT compiler can optimize away the call to the native methods involved in grabbing the stack frames. If that is the case then it can’t optimize away the methods to walk the stack frames, although it can certainly “clean them up”. Likewise, it can’t avoid writing log events to files so I don’t see how it can eliminate the work to call the methods to do it, especially if the log line isn’t constant. That said, it isn’t clear to me even after looking at the examples how much JIT stuff JMH is going to allow or prevent. I do see annotations that can be used to prevent inlining but it isn’t clear to me how JMH calls the benchmark methods. All https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html> says is that @State will prevent “certain” optimizations from occurring.
>
> Don’t get me wrong. I think there are many places where JHM benchmarks are going to be useful. I’m just not sold on it being as useful for “semi-macro” benchmarks like we have in a lot of cases.
>
> Ralph
>
> > On Jul 4, 2019, at 9:21 PM, Remko Popma <re...@gmail.com> wrote:
> >
> > The JMH samples (https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/) are the best place to learn about the pitfalls of java micro benchmarks and how JMH can help you avoid them.
> >
> > Remko.
> >
> > (Shameless plug) Every java main() method deserves http://picocli.info
> >
> >> On Jul 5, 2019, at 11:25, Ralph Goers <ra...@dslextreme.com> wrote:
> >>
> >> I don’t understand what you mean with you last message.
> >>
> >> I don’t really know the details of how JMH works but my understanding is it does something to prevent things like dead code elimination. I haven’t been able to find anything very specific about what JMH actually does though.
> >>
> >> Ralph
> >>
> >>> On Jul 4, 2019, at 11:15 AM, Matt Sicker <bo...@gmail.com> wrote:
> >>>
> >>> And the checkpoint bias thing is alleviated amongst other noise from the
> >>> JVM that would otherwise invalidate the time measurements.
> >>>
> >>>> On Thu, Jul 4, 2019 at 13:14, Matt Sicker <bo...@gmail.com> wrote:
> >>>>
> >>>> I thought JMH ran your code enough for JIT to take effect (though that’s
> >>>> also up to your warmup iteration setting), and the
> >>>>
> >>>> On Thu, Jul 4, 2019 at 00:21, Ralph Goers <ra...@dslextreme.com>
> >>>> wrote:
> >>>>
> >>>>> That said, here are the results of the FileAppenderWithLocationBenchmark
> >>>>> from master.
> >>>>>
> >>>>> Benchmark                                       Mode  Cnt       Score
> >>>>> Error  Units
> >>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   27597.034 ±
> >>>>> 350.465  ops/s
> >>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20   82966.450 ±
> >>>>> 2023.826  ops/s
> >>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20   78820.010 ±
> >>>>> 1452.810  ops/s
> >>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  138656.655 ±
> >>>>> 6329.118  ops/s
> >>>>>
> >>>>> and here are the results with all the changes I have made.
> >>>>>
> >>>>> Benchmark                                       Mode  Cnt       Score
> >>>>> Error  Units
> >>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   47003.919 ±
> >>>>> 1594.591  ops/s
> >>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20  185129.606 ±
> >>>>> 2309.118  ops/s
> >>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20  183346.456 ±
> >>>>> 3103.615  ops/s
> >>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  130121.612 ±
> >>>>> 2535.387  ops/s
> >>>>>
> >>>>> Ralph
> >>>>>
> >>>>>
> >>>>>> On Jul 3, 2019, at 8:35 PM, Ralph Goers <ra...@dslextreme.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Remko,
> >>>>>>
> >>>>>> I ran the existing JHM benchmarks for both no location and with
> >>>>> location and there is still a huge difference. The problem is I don’t
> >>>>> necessarily believe the results. Nor do I think the benchmark I have been
> >>>>> using is totally accurate either. The problem is that, as I understand it,
> >>>>> JMH limits or eliminates much of the optimization the Java runtime will
> >>>>> perform. This is bad for Log4j because in some ways our code is written
> >>>>> inefficiently and hotspot will optimize it. OTOH, since the test is pretty
> >>>>> much logging the same record with an incremented value from the same
> >>>>> location it is hard to say whether the code isn’t being over optimized. I
> >>>>> suspect the real performance will be somewhere in the middle. What would be
> >>>>> better is something like a restful service with plenty of logging driven by
> >>>>> Jmeter once with no location and once with it.
> >>>>>>
> >>>>>> Ralph
> >>>>>>
> >>>>>>> On Jul 2, 2019, at 3:47 PM, Remko Popma <re...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> If memory serves correctly, the `includeLocation` was introduced
> >>>>> because it was necessary for async loggers (and it seemed not feasible to
> >>>>> get this information from the layout pattern).
> >>>>>>>
> >>>>>>> Async loggers need to know upfront whether they need to pass location
> >>>>> information to the consumer thread.
> >>>>>>>
> >>>>>>> If obtaining location information becomes so cheap that it’s
> >>>>> essentially free, it is fine to ignore the `includeLocation` directive and
> >>>>> include location information anyway.
> >>>>>>>
> >>>>>>> About the performance results: what’s the difference between “Log4j
> >>>>> 3.0 with class and method” and “Log4j 3.0 withLocation()”?
> >>>>>>>
> >>>>>>> Also I am generally cautious of custom benchmarks; any reason not to
> >>>>> use JMH?
> >>>>>>>
> >>>>>>> Overall impressive results, congratulations!
> >>>>>>>
> >>>>>>> Remko.
> >>>>>>>
> >>>>>>> (Shameless plug) Every java main() method deserves http://picocli.info
> >>>>>>>
> >>>>>>>> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>> But what do you think about includeLocation. Should it override the
> >>>>> result of requiresLocation or should the fact that we determined location
> >>>>> information is required override the includeLocation setting?
> >>>>>>>>
> >>>>>>>> Ralph
> >>>>>>>>
> >>>>>>>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> These benchmarks are impressive! Moving the point of call site
> >>>>> calculation is a great idea.
> >>>>>>>>>
> >>>>>>>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com
> >>>>> <ma...@dslextreme.com>> wrote:
> >>>>>>>>> I have made changes to the code the master branch to measure and
> >>>>> improve performance. I have committed some of the changes but not the last
> >>>>> set. I have not made the latest changes to release-2.x because they will
> >>>>> have to be implemented a bit differently to avoid compatibility issues. The
> >>>>> table below summarizes what I am seeing, at least on my MacBook Pro.  This
> >>>>> test just basically writes 1.5 million log statements in a loop using an
> >>>>> equivalent format for each logging framework. Note that JUL is faster with
> >>>>> location information than without primarily because a custom Handler had to
> >>>>> be created to format the log record and it performs better than the one
> >>>>> used to log without location information.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> There are several things I should note about this:
> >>>>>>>>> • The first step, which is in the PR, has StackLocator.calcLocation
> >>>>> modified to use a faster algorithm. Streams are not particularly fast. This
> >>>>> resulted in some gains not shown here but can be seen in
> >>>>> https://issues.apache.org/jira/browse/LOG4J2-2644 <
> >>>>> https://issues.apache.org/jira/browse/LOG4J2-2644>
> >>>>>>>>> • Performance is proportional to the number of stack traces that
> >>>>> need to be traversed. JUL only had 11 whereas Log4j had 35 before it called
> >>>>> calcLocation. Our stack is very deep because AbstractLogger and
> >>>>> LoggerConfig have been overly refactored along with having to go though
> >>>>> several intermediate objects.
> >>>>>>>>> • I introduced a requiredLocation() method throughout the stack.
> >>>>> Any component that requires location information sets this value to true.
> >>>>> Based on this, the call to calcLocation can be made significantly higher up
> >>>>> the stack just before the log event is created but only when a configured
> >>>>> component will use it - with a few exceptions. This code is not yet checked
> >>>>> in.
> >>>>>>>>> • If for some reason the location information isn’t calculated when
> >>>>> the event is constructed it can still be lazily populated later using the
> >>>>> current, slower mechanism.
> >>>>>>>>>
> >>>>>>>>> As you can see these changes result in a pretty dramatic
> >>>>> improvement, almost to the point that the cost of obtaining location
> >>>>> information doesn’t really matter - especially in Java 11. The results in
> >>>>> Java 11 amount to 5us per logging call with no location vs 7.5us per
> >>>>> logging call with location info. These results could probably be improved
> >>>>> even more if AbstractLogger was “flattened” a bit.
> >>>>>>>>>
> >>>>>>>>> I should point out that the logic isn’t perfect. I cannot determine
> >>>>> whether location information is required or not for the RoutingAppender
> >>>>> since it dynamically creates Appenders at run time. Instead, I added a
> >>>>> “requiresLocation” attribute that can be set to indicate that.
> >>>>>>>>>
> >>>>>>>>> However, in testing I have encountered a unit test failure in
> >>>>> RandomAccessFileAppenderTest. It is failing because the location
> >>>>> information is being included even though includeLocation is set to false.
> >>>>> I am wondering if that setting needs to be respected since the pattern
> >>>>> specifies including it. It seems odd to me to not include it in that case.
> >>>>>>>>>
> >>>>>>>>> Thoughts?
> >>>>>>>>>
> >>>>>>>>> Ralph
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>> Matt Sicker <bo...@gmail.com>
> >>>>
> >>> --
> >>> Matt Sicker <bo...@gmail.com>
> >>
> >>
>


-- 
Matt Sicker <bo...@gmail.com>

Re: Location performance

Posted by Ralph Goers <ra...@dslextreme.com>.
Well, I can infer it from the examples but it doesn’t come close to explaining some of the things we use it for. I would argue that logging a line whose value changes on every call, even if it is just an incremented integer, that writes to a file is more than a micro-benchmark, and I truly want to know how it is going to perform in real life, not with all the optimizations disabled.  When I first started Log4j 2 I noticed right away how much slower it ran when Java ran in client mode, which was much more common back then. That is essentially what I think JMH is doing with some of these benchmarks. So while it may be useful in determining that one algorithm is better than another when the code has not been optimized I wouldn’t trust that you will see the same behavior when running in a “real” environment.  

For example, in testing calculating the location I doubt the JIT compiler can optimize away the call to the native methods involved in grabbing the stack frames. If that is the case then it can’t optimize away the methods to walk the stack frames, although it can certainly “clean them up”. Likewise, it can’t avoid writing log events to files so I don’t see how it can eliminate the work to call the methods to do it, especially if the log line isn’t constant. That said, it isn’t clear to me even after looking at the examples how much JIT stuff JMH is going to allow or prevent. I do see annotations that can be used to prevent inlining but it isn’t clear to me how JMH calls the benchmark methods. All https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html> says is that @State will prevent “certain” optimizations from occurring.

Don’t get me wrong. I think there are many places where JHM benchmarks are going to be useful. I’m just not sold on it being as useful for “semi-macro” benchmarks like we have in a lot of cases.

Ralph

> On Jul 4, 2019, at 9:21 PM, Remko Popma <re...@gmail.com> wrote:
> 
> The JMH samples (https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/) are the best place to learn about the pitfalls of java micro benchmarks and how JMH can help you avoid them. 
> 
> Remko.
> 
> (Shameless plug) Every java main() method deserves http://picocli.info
> 
>> On Jul 5, 2019, at 11:25, Ralph Goers <ra...@dslextreme.com> wrote:
>> 
>> I don’t understand what you mean with you last message.
>> 
>> I don’t really know the details of how JMH works but my understanding is it does something to prevent things like dead code elimination. I haven’t been able to find anything very specific about what JMH actually does though.
>> 
>> Ralph
>> 
>>> On Jul 4, 2019, at 11:15 AM, Matt Sicker <bo...@gmail.com> wrote:
>>> 
>>> And the checkpoint bias thing is alleviated amongst other noise from the
>>> JVM that would otherwise invalidate the time measurements.
>>> 
>>>> On Thu, Jul 4, 2019 at 13:14, Matt Sicker <bo...@gmail.com> wrote:
>>>> 
>>>> I thought JMH ran your code enough for JIT to take effect (though that’s
>>>> also up to your warmup iteration setting), and the
>>>> 
>>>> On Thu, Jul 4, 2019 at 00:21, Ralph Goers <ra...@dslextreme.com>
>>>> wrote:
>>>> 
>>>>> That said, here are the results of the FileAppenderWithLocationBenchmark
>>>>> from master.
>>>>> 
>>>>> Benchmark                                       Mode  Cnt       Score
>>>>> Error  Units
>>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   27597.034 ±
>>>>> 350.465  ops/s
>>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20   82966.450 ±
>>>>> 2023.826  ops/s
>>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20   78820.010 ±
>>>>> 1452.810  ops/s
>>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  138656.655 ±
>>>>> 6329.118  ops/s
>>>>> 
>>>>> and here are the results with all the changes I have made.
>>>>> 
>>>>> Benchmark                                       Mode  Cnt       Score
>>>>> Error  Units
>>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   47003.919 ±
>>>>> 1594.591  ops/s
>>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20  185129.606 ±
>>>>> 2309.118  ops/s
>>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20  183346.456 ±
>>>>> 3103.615  ops/s
>>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  130121.612 ±
>>>>> 2535.387  ops/s
>>>>> 
>>>>> Ralph
>>>>> 
>>>>> 
>>>>>> On Jul 3, 2019, at 8:35 PM, Ralph Goers <ra...@dslextreme.com>
>>>>> wrote:
>>>>>> 
>>>>>> Remko,
>>>>>> 
>>>>>> I ran the existing JHM benchmarks for both no location and with
>>>>> location and there is still a huge difference. The problem is I don’t
>>>>> necessarily believe the results. Nor do I think the benchmark I have been
>>>>> using is totally accurate either. The problem is that, as I understand it,
>>>>> JMH limits or eliminates much of the optimization the Java runtime will
>>>>> perform. This is bad for Log4j because in some ways our code is written
>>>>> inefficiently and hotspot will optimize it. OTOH, since the test is pretty
>>>>> much logging the same record with an incremented value from the same
>>>>> location it is hard to say whether the code isn’t being over optimized. I
>>>>> suspect the real performance will be somewhere in the middle. What would be
>>>>> better is something like a restful service with plenty of logging driven by
>>>>> Jmeter once with no location and once with it.
>>>>>> 
>>>>>> Ralph
>>>>>> 
>>>>>>> On Jul 2, 2019, at 3:47 PM, Remko Popma <re...@gmail.com> wrote:
>>>>>>> 
>>>>>>> If memory serves correctly, the `includeLocation` was introduced
>>>>> because it was necessary for async loggers (and it seemed not feasible to
>>>>> get this information from the layout pattern).
>>>>>>> 
>>>>>>> Async loggers need to know upfront whether they need to pass location
>>>>> information to the consumer thread.
>>>>>>> 
>>>>>>> If obtaining location information becomes so cheap that it’s
>>>>> essentially free, it is fine to ignore the `includeLocation` directive and
>>>>> include location information anyway.
>>>>>>> 
>>>>>>> About the performance results: what’s the difference between “Log4j
>>>>> 3.0 with class and method” and “Log4j 3.0 withLocation()”?
>>>>>>> 
>>>>>>> Also I am generally cautious of custom benchmarks; any reason not to
>>>>> use JMH?
>>>>>>> 
>>>>>>> Overall impressive results, congratulations!
>>>>>>> 
>>>>>>> Remko.
>>>>>>> 
>>>>>>> (Shameless plug) Every java main() method deserves http://picocli.info
>>>>>>> 
>>>>>>>> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com>
>>>>> wrote:
>>>>>>>> 
>>>>>>>> But what do you think about includeLocation. Should it override the
>>>>> result of requiresLocation or should the fact that we determined location
>>>>> information is required override the includeLocation setting?
>>>>>>>> 
>>>>>>>> Ralph
>>>>>>>> 
>>>>>>>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> These benchmarks are impressive! Moving the point of call site
>>>>> calculation is a great idea.
>>>>>>>>> 
>>>>>>>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com
>>>>> <ma...@dslextreme.com>> wrote:
>>>>>>>>> I have made changes to the code the master branch to measure and
>>>>> improve performance. I have committed some of the changes but not the last
>>>>> set. I have not made the latest changes to release-2.x because they will
>>>>> have to be implemented a bit differently to avoid compatibility issues. The
>>>>> table below summarizes what I am seeing, at least on my MacBook Pro.  This
>>>>> test just basically writes 1.5 million log statements in a loop using an
>>>>> equivalent format for each logging framework. Note that JUL is faster with
>>>>> location information than without primarily because a custom Handler had to
>>>>> be created to format the log record and it performs better than the one
>>>>> used to log without location information.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> There are several things I should note about this:
>>>>>>>>> • The first step, which is in the PR, has StackLocator.calcLocation
>>>>> modified to use a faster algorithm. Streams are not particularly fast. This
>>>>> resulted in some gains not shown here but can be seen in
>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644 <
>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644>
>>>>>>>>> • Performance is proportional to the number of stack traces that
>>>>> need to be traversed. JUL only had 11 whereas Log4j had 35 before it called
>>>>> calcLocation. Our stack is very deep because AbstractLogger and
>>>>> LoggerConfig have been overly refactored along with having to go though
>>>>> several intermediate objects.
>>>>>>>>> • I introduced a requiredLocation() method throughout the stack.
>>>>> Any component that requires location information sets this value to true.
>>>>> Based on this, the call to calcLocation can be made significantly higher up
>>>>> the stack just before the log event is created but only when a configured
>>>>> component will use it - with a few exceptions. This code is not yet checked
>>>>> in.
>>>>>>>>> • If for some reason the location information isn’t calculated when
>>>>> the event is constructed it can still be lazily populated later using the
>>>>> current, slower mechanism.
>>>>>>>>> 
>>>>>>>>> As you can see these changes result in a pretty dramatic
>>>>> improvement, almost to the point that the cost of obtaining location
>>>>> information doesn’t really matter - especially in Java 11. The results in
>>>>> Java 11 amount to 5us per logging call with no location vs 7.5us per
>>>>> logging call with location info. These results could probably be improved
>>>>> even more if AbstractLogger was “flattened” a bit.
>>>>>>>>> 
>>>>>>>>> I should point out that the logic isn’t perfect. I cannot determine
>>>>> whether location information is required or not for the RoutingAppender
>>>>> since it dynamically creates Appenders at run time. Instead, I added a
>>>>> “requiresLocation” attribute that can be set to indicate that.
>>>>>>>>> 
>>>>>>>>> However, in testing I have encountered a unit test failure in
>>>>> RandomAccessFileAppenderTest. It is failing because the location
>>>>> information is being included even though includeLocation is set to false.
>>>>> I am wondering if that setting needs to be respected since the pattern
>>>>> specifies including it. It seems odd to me to not include it in that case.
>>>>>>>>> 
>>>>>>>>> Thoughts?
>>>>>>>>> 
>>>>>>>>> Ralph
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>> Matt Sicker <bo...@gmail.com>
>>>> 
>>> -- 
>>> Matt Sicker <bo...@gmail.com>
>> 
>> 


Re: Location performance

Posted by Remko Popma <re...@gmail.com>.
The JMH samples (https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/) are the best place to learn about the pitfalls of java micro benchmarks and how JMH can help you avoid them. 

Remko.

(Shameless plug) Every java main() method deserves http://picocli.info

> On Jul 5, 2019, at 11:25, Ralph Goers <ra...@dslextreme.com> wrote:
> 
> I don’t understand what you mean with you last message.
> 
> I don’t really know the details of how JMH works but my understanding is it does something to prevent things like dead code elimination. I haven’t been able to find anything very specific about what JMH actually does though.
> 
> Ralph
> 
>> On Jul 4, 2019, at 11:15 AM, Matt Sicker <bo...@gmail.com> wrote:
>> 
>> And the checkpoint bias thing is alleviated amongst other noise from the
>> JVM that would otherwise invalidate the time measurements.
>> 
>>> On Thu, Jul 4, 2019 at 13:14, Matt Sicker <bo...@gmail.com> wrote:
>>> 
>>> I thought JMH ran your code enough for JIT to take effect (though that’s
>>> also up to your warmup iteration setting), and the
>>> 
>>> On Thu, Jul 4, 2019 at 00:21, Ralph Goers <ra...@dslextreme.com>
>>> wrote:
>>> 
>>>> That said, here are the results of the FileAppenderWithLocationBenchmark
>>>> from master.
>>>> 
>>>> Benchmark                                       Mode  Cnt       Score
>>>> Error  Units
>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   27597.034 ±
>>>> 350.465  ops/s
>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20   82966.450 ±
>>>> 2023.826  ops/s
>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20   78820.010 ±
>>>> 1452.810  ops/s
>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  138656.655 ±
>>>> 6329.118  ops/s
>>>> 
>>>> and here are the results with all the changes I have made.
>>>> 
>>>> Benchmark                                       Mode  Cnt       Score
>>>> Error  Units
>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   47003.919 ±
>>>> 1594.591  ops/s
>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20  185129.606 ±
>>>> 2309.118  ops/s
>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20  183346.456 ±
>>>> 3103.615  ops/s
>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  130121.612 ±
>>>> 2535.387  ops/s
>>>> 
>>>> Ralph
>>>> 
>>>> 
>>>>> On Jul 3, 2019, at 8:35 PM, Ralph Goers <ra...@dslextreme.com>
>>>> wrote:
>>>>> 
>>>>> Remko,
>>>>> 
>>>>> I ran the existing JHM benchmarks for both no location and with
>>>> location and there is still a huge difference. The problem is I don’t
>>>> necessarily believe the results. Nor do I think the benchmark I have been
>>>> using is totally accurate either. The problem is that, as I understand it,
>>>> JMH limits or eliminates much of the optimization the Java runtime will
>>>> perform. This is bad for Log4j because in some ways our code is written
>>>> inefficiently and hotspot will optimize it. OTOH, since the test is pretty
>>>> much logging the same record with an incremented value from the same
>>>> location it is hard to say whether the code isn’t being over optimized. I
>>>> suspect the real performance will be somewhere in the middle. What would be
>>>> better is something like a restful service with plenty of logging driven by
>>>> Jmeter once with no location and once with it.
>>>>> 
>>>>> Ralph
>>>>> 
>>>>>> On Jul 2, 2019, at 3:47 PM, Remko Popma <re...@gmail.com> wrote:
>>>>>> 
>>>>>> If memory serves correctly, the `includeLocation` was introduced
>>>> because it was necessary for async loggers (and it seemed not feasible to
>>>> get this information from the layout pattern).
>>>>>> 
>>>>>> Async loggers need to know upfront whether they need to pass location
>>>> information to the consumer thread.
>>>>>> 
>>>>>> If obtaining location information becomes so cheap that it’s
>>>> essentially free, it is fine to ignore the `includeLocation` directive and
>>>> include location information anyway.
>>>>>> 
>>>>>> About the performance results: what’s the difference between “Log4j
>>>> 3.0 with class and method” and “Log4j 3.0 withLocation()”?
>>>>>> 
>>>>>> Also I am generally cautious of custom benchmarks; any reason not to
>>>> use JMH?
>>>>>> 
>>>>>> Overall impressive results, congratulations!
>>>>>> 
>>>>>> Remko.
>>>>>> 
>>>>>> (Shameless plug) Every java main() method deserves http://picocli.info
>>>>>> 
>>>>>>> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com>
>>>> wrote:
>>>>>>> 
>>>>>>> But what do you think about includeLocation. Should it override the
>>>> result of requiresLocation or should the fact that we determined location
>>>> information is required override the includeLocation setting?
>>>>>>> 
>>>>>>> Ralph
>>>>>>> 
>>>>>>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> These benchmarks are impressive! Moving the point of call site
>>>> calculation is a great idea.
>>>>>>>> 
>>>>>>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com
>>>> <ma...@dslextreme.com>> wrote:
>>>>>>>> I have made changes to the code the master branch to measure and
>>>> improve performance. I have committed some of the changes but not the last
>>>> set. I have not made the latest changes to release-2.x because they will
>>>> have to be implemented a bit differently to avoid compatibility issues. The
>>>> table below summarizes what I am seeing, at least on my MacBook Pro.  This
>>>> test just basically writes 1.5 million log statements in a loop using an
>>>> equivalent format for each logging framework. Note that JUL is faster with
>>>> location information than without primarily because a custom Handler had to
>>>> be created to format the log record and it performs better than the one
>>>> used to log without location information.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> There are several things I should note about this:
>>>>>>>> • The first step, which is in the PR, has StackLocator.calcLocation
>>>> modified to use a faster algorithm. Streams are not particularly fast. This
>>>> resulted in some gains not shown here but can be seen in
>>>> https://issues.apache.org/jira/browse/LOG4J2-2644 <
>>>> https://issues.apache.org/jira/browse/LOG4J2-2644>
>>>>>>>> • Performance is proportional to the number of stack traces that
>>>> need to be traversed. JUL only had 11 whereas Log4j had 35 before it called
>>>> calcLocation. Our stack is very deep because AbstractLogger and
>>>> LoggerConfig have been overly refactored along with having to go though
>>>> several intermediate objects.
>>>>>>>> • I introduced a requiredLocation() method throughout the stack.
>>>> Any component that requires location information sets this value to true.
>>>> Based on this, the call to calcLocation can be made significantly higher up
>>>> the stack just before the log event is created but only when a configured
>>>> component will use it - with a few exceptions. This code is not yet checked
>>>> in.
>>>>>>>> • If for some reason the location information isn’t calculated when
>>>> the event is constructed it can still be lazily populated later using the
>>>> current, slower mechanism.
>>>>>>>> 
>>>>>>>> As you can see these changes result in a pretty dramatic
>>>> improvement, almost to the point that the cost of obtaining location
>>>> information doesn’t really matter - especially in Java 11. The results in
>>>> Java 11 amount to 5us per logging call with no location vs 7.5us per
>>>> logging call with location info. These results could probably be improved
>>>> even more if AbstractLogger was “flattened” a bit.
>>>>>>>> 
>>>>>>>> I should point out that the logic isn’t perfect. I cannot determine
>>>> whether location information is required or not for the RoutingAppender
>>>> since it dynamically creates Appenders at run time. Instead, I added a
>>>> “requiresLocation” attribute that can be set to indicate that.
>>>>>>>> 
>>>>>>>> However, in testing I have encountered a unit test failure in
>>>> RandomAccessFileAppenderTest. It is failing because the location
>>>> information is being included even though includeLocation is set to false.
>>>> I am wondering if that setting needs to be respected since the pattern
>>>> specifies including it. It seems odd to me to not include it in that case.
>>>>>>>> 
>>>>>>>> Thoughts?
>>>>>>>> 
>>>>>>>> Ralph
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> --
>>> Matt Sicker <bo...@gmail.com>
>>> 
>> -- 
>> Matt Sicker <bo...@gmail.com>
> 
> 

Re: Location performance

Posted by Ralph Goers <ra...@dslextreme.com>.
I don’t understand what you mean with you last message.

I don’t really know the details of how JMH works but my understanding is it does something to prevent things like dead code elimination. I haven’t been able to find anything very specific about what JMH actually does though.

Ralph

> On Jul 4, 2019, at 11:15 AM, Matt Sicker <bo...@gmail.com> wrote:
> 
> And the checkpoint bias thing is alleviated amongst other noise from the
> JVM that would otherwise invalidate the time measurements.
> 
> On Thu, Jul 4, 2019 at 13:14, Matt Sicker <bo...@gmail.com> wrote:
> 
>> I thought JMH ran your code enough for JIT to take effect (though that’s
>> also up to your warmup iteration setting), and the
>> 
>> On Thu, Jul 4, 2019 at 00:21, Ralph Goers <ra...@dslextreme.com>
>> wrote:
>> 
>>> That said, here are the results of the FileAppenderWithLocationBenchmark
>>> from master.
>>> 
>>> Benchmark                                       Mode  Cnt       Score
>>>  Error  Units
>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   27597.034 ±
>>> 350.465  ops/s
>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20   82966.450 ±
>>> 2023.826  ops/s
>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20   78820.010 ±
>>> 1452.810  ops/s
>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  138656.655 ±
>>> 6329.118  ops/s
>>> 
>>> and here are the results with all the changes I have made.
>>> 
>>> Benchmark                                       Mode  Cnt       Score
>>>  Error  Units
>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   47003.919 ±
>>> 1594.591  ops/s
>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20  185129.606 ±
>>> 2309.118  ops/s
>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20  183346.456 ±
>>> 3103.615  ops/s
>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  130121.612 ±
>>> 2535.387  ops/s
>>> 
>>> Ralph
>>> 
>>> 
>>>> On Jul 3, 2019, at 8:35 PM, Ralph Goers <ra...@dslextreme.com>
>>> wrote:
>>>> 
>>>> Remko,
>>>> 
>>>> I ran the existing JHM benchmarks for both no location and with
>>> location and there is still a huge difference. The problem is I don’t
>>> necessarily believe the results. Nor do I think the benchmark I have been
>>> using is totally accurate either. The problem is that, as I understand it,
>>> JMH limits or eliminates much of the optimization the Java runtime will
>>> perform. This is bad for Log4j because in some ways our code is written
>>> inefficiently and hotspot will optimize it. OTOH, since the test is pretty
>>> much logging the same record with an incremented value from the same
>>> location it is hard to say whether the code isn’t being over optimized. I
>>> suspect the real performance will be somewhere in the middle. What would be
>>> better is something like a restful service with plenty of logging driven by
>>> Jmeter once with no location and once with it.
>>>> 
>>>> Ralph
>>>> 
>>>>> On Jul 2, 2019, at 3:47 PM, Remko Popma <re...@gmail.com> wrote:
>>>>> 
>>>>> If memory serves correctly, the `includeLocation` was introduced
>>> because it was necessary for async loggers (and it seemed not feasible to
>>> get this information from the layout pattern).
>>>>> 
>>>>> Async loggers need to know upfront whether they need to pass location
>>> information to the consumer thread.
>>>>> 
>>>>> If obtaining location information becomes so cheap that it’s
>>> essentially free, it is fine to ignore the `includeLocation` directive and
>>> include location information anyway.
>>>>> 
>>>>> About the performance results: what’s the difference between “Log4j
>>> 3.0 with class and method” and “Log4j 3.0 withLocation()”?
>>>>> 
>>>>> Also I am generally cautious of custom benchmarks; any reason not to
>>> use JMH?
>>>>> 
>>>>> Overall impressive results, congratulations!
>>>>> 
>>>>> Remko.
>>>>> 
>>>>> (Shameless plug) Every java main() method deserves http://picocli.info
>>>>> 
>>>>>> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com>
>>> wrote:
>>>>>> 
>>>>>> But what do you think about includeLocation. Should it override the
>>> result of requiresLocation or should the fact that we determined location
>>> information is required override the includeLocation setting?
>>>>>> 
>>>>>> Ralph
>>>>>> 
>>>>>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
>>>>>>> 
>>>>>>> These benchmarks are impressive! Moving the point of call site
>>> calculation is a great idea.
>>>>>>> 
>>>>>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com
>>> <ma...@dslextreme.com>> wrote:
>>>>>>> I have made changes to the code the master branch to measure and
>>> improve performance. I have committed some of the changes but not the last
>>> set. I have not made the latest changes to release-2.x because they will
>>> have to be implemented a bit differently to avoid compatibility issues. The
>>> table below summarizes what I am seeing, at least on my MacBook Pro.  This
>>> test just basically writes 1.5 million log statements in a loop using an
>>> equivalent format for each logging framework. Note that JUL is faster with
>>> location information than without primarily because a custom Handler had to
>>> be created to format the log record and it performs better than the one
>>> used to log without location information.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> There are several things I should note about this:
>>>>>>> • The first step, which is in the PR, has StackLocator.calcLocation
>>> modified to use a faster algorithm. Streams are not particularly fast. This
>>> resulted in some gains not shown here but can be seen in
>>> https://issues.apache.org/jira/browse/LOG4J2-2644 <
>>> https://issues.apache.org/jira/browse/LOG4J2-2644>
>>>>>>> • Performance is proportional to the number of stack traces that
>>> need to be traversed. JUL only had 11 whereas Log4j had 35 before it called
>>> calcLocation. Our stack is very deep because AbstractLogger and
>>> LoggerConfig have been overly refactored along with having to go though
>>> several intermediate objects.
>>>>>>> • I introduced a requiredLocation() method throughout the stack.
>>> Any component that requires location information sets this value to true.
>>> Based on this, the call to calcLocation can be made significantly higher up
>>> the stack just before the log event is created but only when a configured
>>> component will use it - with a few exceptions. This code is not yet checked
>>> in.
>>>>>>> • If for some reason the location information isn’t calculated when
>>> the event is constructed it can still be lazily populated later using the
>>> current, slower mechanism.
>>>>>>> 
>>>>>>> As you can see these changes result in a pretty dramatic
>>> improvement, almost to the point that the cost of obtaining location
>>> information doesn’t really matter - especially in Java 11. The results in
>>> Java 11 amount to 5us per logging call with no location vs 7.5us per
>>> logging call with location info. These results could probably be improved
>>> even more if AbstractLogger was “flattened” a bit.
>>>>>>> 
>>>>>>> I should point out that the logic isn’t perfect. I cannot determine
>>> whether location information is required or not for the RoutingAppender
>>> since it dynamically creates Appenders at run time. Instead, I added a
>>> “requiresLocation” attribute that can be set to indicate that.
>>>>>>> 
>>>>>>> However, in testing I have encountered a unit test failure in
>>> RandomAccessFileAppenderTest. It is failing because the location
>>> information is being included even though includeLocation is set to false.
>>> I am wondering if that setting needs to be respected since the pattern
>>> specifies including it. It seems odd to me to not include it in that case.
>>>>>>> 
>>>>>>> Thoughts?
>>>>>>> 
>>>>>>> Ralph
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> --
>> Matt Sicker <bo...@gmail.com>
>> 
> -- 
> Matt Sicker <bo...@gmail.com>



Re: Location performance

Posted by Matt Sicker <bo...@gmail.com>.
And the checkpoint bias thing is alleviated amongst other noise from the
JVM that would otherwise invalidate the time measurements.

On Thu, Jul 4, 2019 at 13:14, Matt Sicker <bo...@gmail.com> wrote:

> I thought JMH ran your code enough for JIT to take effect (though that’s
> also up to your warmup iteration setting), and the
>
> On Thu, Jul 4, 2019 at 00:21, Ralph Goers <ra...@dslextreme.com>
> wrote:
>
>> That said, here are the results of the FileAppenderWithLocationBenchmark
>> from master.
>>
>> Benchmark                                       Mode  Cnt       Score
>>   Error  Units
>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   27597.034 ±
>> 350.465  ops/s
>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20   82966.450 ±
>> 2023.826  ops/s
>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20   78820.010 ±
>> 1452.810  ops/s
>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  138656.655 ±
>> 6329.118  ops/s
>>
>> and here are the results with all the changes I have made.
>>
>> Benchmark                                       Mode  Cnt       Score
>>   Error  Units
>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   47003.919 ±
>> 1594.591  ops/s
>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20  185129.606 ±
>> 2309.118  ops/s
>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20  183346.456 ±
>> 3103.615  ops/s
>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  130121.612 ±
>> 2535.387  ops/s
>>
>> Ralph
>>
>>
>> > On Jul 3, 2019, at 8:35 PM, Ralph Goers <ra...@dslextreme.com>
>> wrote:
>> >
>> > Remko,
>> >
>> > I ran the existing JHM benchmarks for both no location and with
>> location and there is still a huge difference. The problem is I don’t
>> necessarily believe the results. Nor do I think the benchmark I have been
>> using is totally accurate either. The problem is that, as I understand it,
>> JMH limits or eliminates much of the optimization the Java runtime will
>> perform. This is bad for Log4j because in some ways our code is written
>> inefficiently and hotspot will optimize it. OTOH, since the test is pretty
>> much logging the same record with an incremented value from the same
>> location it is hard to say whether the code isn’t being over optimized. I
>> suspect the real performance will be somewhere in the middle. What would be
>> better is something like a restful service with plenty of logging driven by
>> Jmeter once with no location and once with it.
>> >
>> > Ralph
>> >
>> >> On Jul 2, 2019, at 3:47 PM, Remko Popma <re...@gmail.com> wrote:
>> >>
>> >> If memory serves correctly, the `includeLocation` was introduced
>> because it was necessary for async loggers (and it seemed not feasible to
>> get this information from the layout pattern).
>> >>
>> >> Async loggers need to know upfront whether they need to pass location
>> information to the consumer thread.
>> >>
>> >> If obtaining location information becomes so cheap that it’s
>> essentially free, it is fine to ignore the `includeLocation` directive and
>> include location information anyway.
>> >>
>> >> About the performance results: what’s the difference between “Log4j
>> 3.0 with class and method” and “Log4j 3.0 withLocation()”?
>> >>
>> >> Also I am generally cautious of custom benchmarks; any reason not to
>> use JMH?
>> >>
>> >> Overall impressive results, congratulations!
>> >>
>> >> Remko.
>> >>
>> >> (Shameless plug) Every java main() method deserves http://picocli.info
>> >>
>> >>> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com>
>> wrote:
>> >>>
>> >>> But what do you think about includeLocation. Should it override the
>> result of requiresLocation or should the fact that we determined location
>> information is required override the includeLocation setting?
>> >>>
>> >>> Ralph
>> >>>
>> >>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
>> >>>>
>> >>>> These benchmarks are impressive! Moving the point of call site
>> calculation is a great idea.
>> >>>>
>> >>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com
>> <ma...@dslextreme.com>> wrote:
>> >>>> I have made changes to the code the master branch to measure and
>> improve performance. I have committed some of the changes but not the last
>> set. I have not made the latest changes to release-2.x because they will
>> have to be implemented a bit differently to avoid compatibility issues. The
>> table below summarizes what I am seeing, at least on my MacBook Pro.  This
>> test just basically writes 1.5 million log statements in a loop using an
>> equivalent format for each logging framework. Note that JUL is faster with
>> location information than without primarily because a custom Handler had to
>> be created to format the log record and it performs better than the one
>> used to log without location information.
>> >>>>
>> >>>>
>> >>>>
>> >>>> There are several things I should note about this:
>> >>>>  • The first step, which is in the PR, has StackLocator.calcLocation
>> modified to use a faster algorithm. Streams are not particularly fast. This
>> resulted in some gains not shown here but can be seen in
>> https://issues.apache.org/jira/browse/LOG4J2-2644 <
>> https://issues.apache.org/jira/browse/LOG4J2-2644>
>> >>>>  • Performance is proportional to the number of stack traces that
>> need to be traversed. JUL only had 11 whereas Log4j had 35 before it called
>> calcLocation. Our stack is very deep because AbstractLogger and
>> LoggerConfig have been overly refactored along with having to go though
>> several intermediate objects.
>> >>>>  • I introduced a requiredLocation() method throughout the stack.
>> Any component that requires location information sets this value to true.
>> Based on this, the call to calcLocation can be made significantly higher up
>> the stack just before the log event is created but only when a configured
>> component will use it - with a few exceptions. This code is not yet checked
>> in.
>> >>>>  • If for some reason the location information isn’t calculated when
>> the event is constructed it can still be lazily populated later using the
>> current, slower mechanism.
>> >>>>
>> >>>> As you can see these changes result in a pretty dramatic
>> improvement, almost to the point that the cost of obtaining location
>> information doesn’t really matter - especially in Java 11. The results in
>> Java 11 amount to 5us per logging call with no location vs 7.5us per
>> logging call with location info. These results could probably be improved
>> even more if AbstractLogger was “flattened” a bit.
>> >>>>
>> >>>> I should point out that the logic isn’t perfect. I cannot determine
>> whether location information is required or not for the RoutingAppender
>> since it dynamically creates Appenders at run time. Instead, I added a
>> “requiresLocation” attribute that can be set to indicate that.
>> >>>>
>> >>>> However, in testing I have encountered a unit test failure in
>> RandomAccessFileAppenderTest. It is failing because the location
>> information is being included even though includeLocation is set to false.
>> I am wondering if that setting needs to be respected since the pattern
>> specifies including it. It seems odd to me to not include it in that case.
>> >>>>
>> >>>> Thoughts?
>> >>>>
>> >>>> Ralph
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
>> >>>
>> >>
>> >
>> >
>> >
>>
>> --
> Matt Sicker <bo...@gmail.com>
>
-- 
Matt Sicker <bo...@gmail.com>

Re: Location performance

Posted by Matt Sicker <bo...@gmail.com>.
I thought JMH ran your code enough for JIT to take effect (though that’s
also up to your warmup iteration setting), and the

On Thu, Jul 4, 2019 at 00:21, Ralph Goers <ra...@dslextreme.com>
wrote:

> That said, here are the results of the FileAppenderWithLocationBenchmark
> from master.
>
> Benchmark                                       Mode  Cnt       Score
> Error  Units
> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   27597.034 ±
> 350.465  ops/s
> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20   82966.450 ±
> 2023.826  ops/s
> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20   78820.010 ±
> 1452.810  ops/s
> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  138656.655 ±
> 6329.118  ops/s
>
> and here are the results with all the changes I have made.
>
> Benchmark                                       Mode  Cnt       Score
> Error  Units
> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   47003.919 ±
> 1594.591  ops/s
> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20  185129.606 ±
> 2309.118  ops/s
> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20  183346.456 ±
> 3103.615  ops/s
> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  130121.612 ±
> 2535.387  ops/s
>
> Ralph
>
>
> > On Jul 3, 2019, at 8:35 PM, Ralph Goers <ra...@dslextreme.com>
> wrote:
> >
> > Remko,
> >
> > I ran the existing JHM benchmarks for both no location and with location
> and there is still a huge difference. The problem is I don’t necessarily
> believe the results. Nor do I think the benchmark I have been using is
> totally accurate either. The problem is that, as I understand it, JMH
> limits or eliminates much of the optimization the Java runtime will
> perform. This is bad for Log4j because in some ways our code is written
> inefficiently and hotspot will optimize it. OTOH, since the test is pretty
> much logging the same record with an incremented value from the same
> location it is hard to say whether the code isn’t being over optimized. I
> suspect the real performance will be somewhere in the middle. What would be
> better is something like a restful service with plenty of logging driven by
> Jmeter once with no location and once with it.
> >
> > Ralph
> >
> >> On Jul 2, 2019, at 3:47 PM, Remko Popma <re...@gmail.com> wrote:
> >>
> >> If memory serves correctly, the `includeLocation` was introduced
> because it was necessary for async loggers (and it seemed not feasible to
> get this information from the layout pattern).
> >>
> >> Async loggers need to know upfront whether they need to pass location
> information to the consumer thread.
> >>
> >> If obtaining location information becomes so cheap that it’s
> essentially free, it is fine to ignore the `includeLocation` directive and
> include location information anyway.
> >>
> >> About the performance results: what’s the difference between “Log4j 3.0
> with class and method” and “Log4j 3.0 withLocation()”?
> >>
> >> Also I am generally cautious of custom benchmarks; any reason not to
> use JMH?
> >>
> >> Overall impressive results, congratulations!
> >>
> >> Remko.
> >>
> >> (Shameless plug) Every java main() method deserves http://picocli.info
> >>
> >>> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com>
> wrote:
> >>>
> >>> But what do you think about includeLocation. Should it override the
> result of requiresLocation or should the fact that we determined location
> information is required override the includeLocation setting?
> >>>
> >>> Ralph
> >>>
> >>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
> >>>>
> >>>> These benchmarks are impressive! Moving the point of call site
> calculation is a great idea.
> >>>>
> >>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com
> <ma...@dslextreme.com>> wrote:
> >>>> I have made changes to the code the master branch to measure and
> improve performance. I have committed some of the changes but not the last
> set. I have not made the latest changes to release-2.x because they will
> have to be implemented a bit differently to avoid compatibility issues. The
> table below summarizes what I am seeing, at least on my MacBook Pro.  This
> test just basically writes 1.5 million log statements in a loop using an
> equivalent format for each logging framework. Note that JUL is faster with
> location information than without primarily because a custom Handler had to
> be created to format the log record and it performs better than the one
> used to log without location information.
> >>>>
> >>>>
> >>>>
> >>>> There are several things I should note about this:
> >>>>  • The first step, which is in the PR, has StackLocator.calcLocation
> modified to use a faster algorithm. Streams are not particularly fast. This
> resulted in some gains not shown here but can be seen in
> https://issues.apache.org/jira/browse/LOG4J2-2644 <
> https://issues.apache.org/jira/browse/LOG4J2-2644>
> >>>>  • Performance is proportional to the number of stack traces that
> need to be traversed. JUL only had 11 whereas Log4j had 35 before it called
> calcLocation. Our stack is very deep because AbstractLogger and
> LoggerConfig have been overly refactored along with having to go though
> several intermediate objects.
> >>>>  • I introduced a requiredLocation() method throughout the stack. Any
> component that requires location information sets this value to true. Based
> on this, the call to calcLocation can be made significantly higher up the
> stack just before the log event is created but only when a configured
> component will use it - with a few exceptions. This code is not yet checked
> in.
> >>>>  • If for some reason the location information isn’t calculated when
> the event is constructed it can still be lazily populated later using the
> current, slower mechanism.
> >>>>
> >>>> As you can see these changes result in a pretty dramatic improvement,
> almost to the point that the cost of obtaining location information doesn’t
> really matter - especially in Java 11. The results in Java 11 amount to 5us
> per logging call with no location vs 7.5us per logging call with location
> info. These results could probably be improved even more if AbstractLogger
> was “flattened” a bit.
> >>>>
> >>>> I should point out that the logic isn’t perfect. I cannot determine
> whether location information is required or not for the RoutingAppender
> since it dynamically creates Appenders at run time. Instead, I added a
> “requiresLocation” attribute that can be set to indicate that.
> >>>>
> >>>> However, in testing I have encountered a unit test failure in
> RandomAccessFileAppenderTest. It is failing because the location
> information is being included even though includeLocation is set to false.
> I am wondering if that setting needs to be respected since the pattern
> specifies including it. It seems odd to me to not include it in that case.
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> Ralph
> >>>>
> >>>>
> >>>> --
> >>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
> >>>
> >>
> >
> >
> >
>
> --
Matt Sicker <bo...@gmail.com>

Re: Location performance

Posted by Ralph Goers <ra...@dslextreme.com>.
That said, here are the results of the FileAppenderWithLocationBenchmark from master.

Benchmark                                       Mode  Cnt       Score      Error  Units
FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   27597.034 ±  350.465  ops/s
FileAppenderWithLocationBenchmark.log4j2File   thrpt   20   82966.450 ± 2023.826  ops/s
FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20   78820.010 ± 1452.810  ops/s
FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  138656.655 ± 6329.118  ops/s

and here are the results with all the changes I have made.

Benchmark                                       Mode  Cnt       Score      Error  Units
FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   47003.919 ± 1594.591  ops/s
FileAppenderWithLocationBenchmark.log4j2File   thrpt   20  185129.606 ± 2309.118  ops/s
FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20  183346.456 ± 3103.615  ops/s
FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  130121.612 ± 2535.387  ops/s

Ralph


> On Jul 3, 2019, at 8:35 PM, Ralph Goers <ra...@dslextreme.com> wrote:
> 
> Remko,
> 
> I ran the existing JHM benchmarks for both no location and with location and there is still a huge difference. The problem is I don’t necessarily believe the results. Nor do I think the benchmark I have been using is totally accurate either. The problem is that, as I understand it, JMH limits or eliminates much of the optimization the Java runtime will perform. This is bad for Log4j because in some ways our code is written inefficiently and hotspot will optimize it. OTOH, since the test is pretty much logging the same record with an incremented value from the same location it is hard to say whether the code isn’t being over optimized. I suspect the real performance will be somewhere in the middle. What would be better is something like a restful service with plenty of logging driven by Jmeter once with no location and once with it.
> 
> Ralph
> 
>> On Jul 2, 2019, at 3:47 PM, Remko Popma <re...@gmail.com> wrote:
>> 
>> If memory serves correctly, the `includeLocation` was introduced because it was necessary for async loggers (and it seemed not feasible to get this information from the layout pattern). 
>> 
>> Async loggers need to know upfront whether they need to pass location information to the consumer thread. 
>> 
>> If obtaining location information becomes so cheap that it’s essentially free, it is fine to ignore the `includeLocation` directive and include location information anyway. 
>> 
>> About the performance results: what’s the difference between “Log4j 3.0 with class and method” and “Log4j 3.0 withLocation()”?
>> 
>> Also I am generally cautious of custom benchmarks; any reason not to use JMH?
>> 
>> Overall impressive results, congratulations!
>> 
>> Remko.
>> 
>> (Shameless plug) Every java main() method deserves http://picocli.info
>> 
>>> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com> wrote:
>>> 
>>> But what do you think about includeLocation. Should it override the result of requiresLocation or should the fact that we determined location information is required override the includeLocation setting?
>>> 
>>> Ralph
>>> 
>>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
>>>> 
>>>> These benchmarks are impressive! Moving the point of call site calculation is a great idea.
>>>> 
>>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com <ma...@dslextreme.com>> wrote:
>>>> I have made changes to the code the master branch to measure and improve performance. I have committed some of the changes but not the last set. I have not made the latest changes to release-2.x because they will have to be implemented a bit differently to avoid compatibility issues. The table below summarizes what I am seeing, at least on my MacBook Pro.  This test just basically writes 1.5 million log statements in a loop using an equivalent format for each logging framework. Note that JUL is faster with location information than without primarily because a custom Handler had to be created to format the log record and it performs better than the one used to log without location information.
>>>> 
>>>> 
>>>> 
>>>> There are several things I should note about this:
>>>>  • The first step, which is in the PR, has StackLocator.calcLocation modified to use a faster algorithm. Streams are not particularly fast. This resulted in some gains not shown here but can be seen in https://issues.apache.org/jira/browse/LOG4J2-2644 <https://issues.apache.org/jira/browse/LOG4J2-2644>
>>>>  • Performance is proportional to the number of stack traces that need to be traversed. JUL only had 11 whereas Log4j had 35 before it called calcLocation. Our stack is very deep because AbstractLogger and LoggerConfig have been overly refactored along with having to go though several intermediate objects.
>>>>  • I introduced a requiredLocation() method throughout the stack. Any component that requires location information sets this value to true. Based on this, the call to calcLocation can be made significantly higher up the stack just before the log event is created but only when a configured component will use it - with a few exceptions. This code is not yet checked in.
>>>>  • If for some reason the location information isn’t calculated when the event is constructed it can still be lazily populated later using the current, slower mechanism.
>>>> 
>>>> As you can see these changes result in a pretty dramatic improvement, almost to the point that the cost of obtaining location information doesn’t really matter - especially in Java 11. The results in Java 11 amount to 5us per logging call with no location vs 7.5us per logging call with location info. These results could probably be improved even more if AbstractLogger was “flattened” a bit.
>>>> 
>>>> I should point out that the logic isn’t perfect. I cannot determine whether location information is required or not for the RoutingAppender since it dynamically creates Appenders at run time. Instead, I added a “requiresLocation” attribute that can be set to indicate that. 
>>>> 
>>>> However, in testing I have encountered a unit test failure in RandomAccessFileAppenderTest. It is failing because the location information is being included even though includeLocation is set to false. I am wondering if that setting needs to be respected since the pattern specifies including it. It seems odd to me to not include it in that case.
>>>> 
>>>> Thoughts?
>>>> 
>>>> Ralph
>>>> 
>>>> 
>>>> -- 
>>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
>>> 
>> 
> 
> 
> 


Re: Location performance

Posted by Ralph Goers <ra...@dslextreme.com>.
Remko,

I ran the existing JHM benchmarks for both no location and with location and there is still a huge difference. The problem is I don’t necessarily believe the results. Nor do I think the benchmark I have been using is totally accurate either. The problem is that, as I understand it, JMH limits or eliminates much of the optimization the Java runtime will perform. This is bad for Log4j because in some ways our code is written inefficiently and hotspot will optimize it. OTOH, since the test is pretty much logging the same record with an incremented value from the same location it is hard to say whether the code isn’t being over optimized. I suspect the real performance will be somewhere in the middle. What would be better is something like a restful service with plenty of logging driven by Jmeter once with no location and once with it.

Ralph

> On Jul 2, 2019, at 3:47 PM, Remko Popma <re...@gmail.com> wrote:
> 
> If memory serves correctly, the `includeLocation` was introduced because it was necessary for async loggers (and it seemed not feasible to get this information from the layout pattern). 
> 
> Async loggers need to know upfront whether they need to pass location information to the consumer thread. 
> 
> If obtaining location information becomes so cheap that it’s essentially free, it is fine to ignore the `includeLocation` directive and include location information anyway. 
> 
> About the performance results: what’s the difference between “Log4j 3.0 with class and method” and “Log4j 3.0 withLocation()”?
> 
> Also I am generally cautious of custom benchmarks; any reason not to use JMH?
> 
> Overall impressive results, congratulations!
> 
> Remko.
> 
> (Shameless plug) Every java main() method deserves http://picocli.info
> 
>> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com> wrote:
>> 
>> But what do you think about includeLocation. Should it override the result of requiresLocation or should the fact that we determined location information is required override the includeLocation setting?
>> 
>> Ralph
>> 
>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
>>> 
>>> These benchmarks are impressive! Moving the point of call site calculation is a great idea.
>>> 
>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com <ma...@dslextreme.com>> wrote:
>>> I have made changes to the code the master branch to measure and improve performance. I have committed some of the changes but not the last set. I have not made the latest changes to release-2.x because they will have to be implemented a bit differently to avoid compatibility issues. The table below summarizes what I am seeing, at least on my MacBook Pro.  This test just basically writes 1.5 million log statements in a loop using an equivalent format for each logging framework. Note that JUL is faster with location information than without primarily because a custom Handler had to be created to format the log record and it performs better than the one used to log without location information.
>>> 
>>> 
>>> 
>>> There are several things I should note about this:
>>>   • The first step, which is in the PR, has StackLocator.calcLocation modified to use a faster algorithm. Streams are not particularly fast. This resulted in some gains not shown here but can be seen in https://issues.apache.org/jira/browse/LOG4J2-2644 <https://issues.apache.org/jira/browse/LOG4J2-2644>
>>>   • Performance is proportional to the number of stack traces that need to be traversed. JUL only had 11 whereas Log4j had 35 before it called calcLocation. Our stack is very deep because AbstractLogger and LoggerConfig have been overly refactored along with having to go though several intermediate objects.
>>>   • I introduced a requiredLocation() method throughout the stack. Any component that requires location information sets this value to true. Based on this, the call to calcLocation can be made significantly higher up the stack just before the log event is created but only when a configured component will use it - with a few exceptions. This code is not yet checked in.
>>>   • If for some reason the location information isn’t calculated when the event is constructed it can still be lazily populated later using the current, slower mechanism.
>>> 
>>> As you can see these changes result in a pretty dramatic improvement, almost to the point that the cost of obtaining location information doesn’t really matter - especially in Java 11. The results in Java 11 amount to 5us per logging call with no location vs 7.5us per logging call with location info. These results could probably be improved even more if AbstractLogger was “flattened” a bit.
>>> 
>>> I should point out that the logic isn’t perfect. I cannot determine whether location information is required or not for the RoutingAppender since it dynamically creates Appenders at run time. Instead, I added a “requiresLocation” attribute that can be set to indicate that. 
>>> 
>>> However, in testing I have encountered a unit test failure in RandomAccessFileAppenderTest. It is failing because the location information is being included even though includeLocation is set to false. I am wondering if that setting needs to be respected since the pattern specifies including it. It seems odd to me to not include it in that case.
>>> 
>>> Thoughts?
>>> 
>>> Ralph
>>> 
>>> 
>>> -- 
>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
>> 
> 



Re: Location performance

Posted by Ralph Goers <ra...@dslextreme.com>.
Never answer email from the phone. It spell checks everything wrong.

Ralph

> On Jul 2, 2019, at 3:54 PM, Ralph Goers <ra...@dslextreme.com> wrote:
> 
> I am using these because the reporter provided them. I will convert them to John at some point.
> 
> Log4J 3.0 is the master branch. Since it uses java 8 some things are easier to do than on release-2.x. The withLocation test uses the new builder I add to the api. The witHLocation method doesn’t have to walk the stack so it is pretty fast.
> 
> Ralph
> 
>> On Jul 2, 2019, at 3:47 PM, Remko Popma <re...@gmail.com> wrote:
>> 
>> If memory serves correctly, the `includeLocation` was introduced because it was necessary for async loggers (and it seemed not feasible to get this information from the layout pattern). 
>> 
>> Async loggers need to know upfront whether they need to pass location information to the consumer thread. 
>> 
>> If obtaining location information becomes so cheap that it’s essentially free, it is fine to ignore the `includeLocation` directive and include location information anyway. 
>> 
>> About the performance results: what’s the difference between “Log4j 3.0 with class and method” and “Log4j 3.0 withLocation()”?
>> 
>> Also I am generally cautious of custom benchmarks; any reason not to use JMH?
>> 
>> Overall impressive results, congratulations!
>> 
>> Remko.
>> 
>> (Shameless plug) Every java main() method deserves http://picocli.info
>> 
>>> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com> wrote:
>>> 
>>> But what do you think about includeLocation. Should it override the result of requiresLocation or should the fact that we determined location information is required override the includeLocation setting?
>>> 
>>> Ralph
>>> 
>>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
>>>> 
>>>> These benchmarks are impressive! Moving the point of call site calculation is a great idea.
>>>> 
>>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com <ma...@dslextreme.com>> wrote:
>>>> I have made changes to the code the master branch to measure and improve performance. I have committed some of the changes but not the last set. I have not made the latest changes to release-2.x because they will have to be implemented a bit differently to avoid compatibility issues. The table below summarizes what I am seeing, at least on my MacBook Pro.  This test just basically writes 1.5 million log statements in a loop using an equivalent format for each logging framework. Note that JUL is faster with location information than without primarily because a custom Handler had to be created to format the log record and it performs better than the one used to log without location information.
>>>> 
>>>> 
>>>> 
>>>> There are several things I should note about this:
>>>>  • The first step, which is in the PR, has StackLocator.calcLocation modified to use a faster algorithm. Streams are not particularly fast. This resulted in some gains not shown here but can be seen in https://issues.apache.org/jira/browse/LOG4J2-2644 <https://issues.apache.org/jira/browse/LOG4J2-2644>
>>>>  • Performance is proportional to the number of stack traces that need to be traversed. JUL only had 11 whereas Log4j had 35 before it called calcLocation. Our stack is very deep because AbstractLogger and LoggerConfig have been overly refactored along with having to go though several intermediate objects.
>>>>  • I introduced a requiredLocation() method throughout the stack. Any component that requires location information sets this value to true. Based on this, the call to calcLocation can be made significantly higher up the stack just before the log event is created but only when a configured component will use it - with a few exceptions. This code is not yet checked in.
>>>>  • If for some reason the location information isn’t calculated when the event is constructed it can still be lazily populated later using the current, slower mechanism.
>>>> 
>>>> As you can see these changes result in a pretty dramatic improvement, almost to the point that the cost of obtaining location information doesn’t really matter - especially in Java 11. The results in Java 11 amount to 5us per logging call with no location vs 7.5us per logging call with location info. These results could probably be improved even more if AbstractLogger was “flattened” a bit.
>>>> 
>>>> I should point out that the logic isn’t perfect. I cannot determine whether location information is required or not for the RoutingAppender since it dynamically creates Appenders at run time. Instead, I added a “requiresLocation” attribute that can be set to indicate that. 
>>>> 
>>>> However, in testing I have encountered a unit test failure in RandomAccessFileAppenderTest. It is failing because the location information is being included even though includeLocation is set to false. I am wondering if that setting needs to be respected since the pattern specifies including it. It seems odd to me to not include it in that case.
>>>> 
>>>> Thoughts?
>>>> 
>>>> Ralph
>>>> 
>>>> 
>>>> -- 
>>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
>>> 
>> 
> 
> 
> 



Re: Location performance

Posted by Ralph Goers <ra...@dslextreme.com>.
I am using these because the reporter provided them. I will convert them to John at some point.

Log4J 3.0 is the master branch. Since it uses java 8 some things are easier to do than on release-2.x. The withLocation test uses the new builder I add to the api. The witHLocation method doesn’t have to walk the stack so it is pretty fast.

Ralph

> On Jul 2, 2019, at 3:47 PM, Remko Popma <re...@gmail.com> wrote:
> 
> If memory serves correctly, the `includeLocation` was introduced because it was necessary for async loggers (and it seemed not feasible to get this information from the layout pattern). 
> 
> Async loggers need to know upfront whether they need to pass location information to the consumer thread. 
> 
> If obtaining location information becomes so cheap that it’s essentially free, it is fine to ignore the `includeLocation` directive and include location information anyway. 
> 
> About the performance results: what’s the difference between “Log4j 3.0 with class and method” and “Log4j 3.0 withLocation()”?
> 
> Also I am generally cautious of custom benchmarks; any reason not to use JMH?
> 
> Overall impressive results, congratulations!
> 
> Remko.
> 
> (Shameless plug) Every java main() method deserves http://picocli.info
> 
>> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com> wrote:
>> 
>> But what do you think about includeLocation. Should it override the result of requiresLocation or should the fact that we determined location information is required override the includeLocation setting?
>> 
>> Ralph
>> 
>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
>>> 
>>> These benchmarks are impressive! Moving the point of call site calculation is a great idea.
>>> 
>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com <ma...@dslextreme.com>> wrote:
>>> I have made changes to the code the master branch to measure and improve performance. I have committed some of the changes but not the last set. I have not made the latest changes to release-2.x because they will have to be implemented a bit differently to avoid compatibility issues. The table below summarizes what I am seeing, at least on my MacBook Pro.  This test just basically writes 1.5 million log statements in a loop using an equivalent format for each logging framework. Note that JUL is faster with location information than without primarily because a custom Handler had to be created to format the log record and it performs better than the one used to log without location information.
>>> 
>>> 
>>> 
>>> There are several things I should note about this:
>>>   • The first step, which is in the PR, has StackLocator.calcLocation modified to use a faster algorithm. Streams are not particularly fast. This resulted in some gains not shown here but can be seen in https://issues.apache.org/jira/browse/LOG4J2-2644 <https://issues.apache.org/jira/browse/LOG4J2-2644>
>>>   • Performance is proportional to the number of stack traces that need to be traversed. JUL only had 11 whereas Log4j had 35 before it called calcLocation. Our stack is very deep because AbstractLogger and LoggerConfig have been overly refactored along with having to go though several intermediate objects.
>>>   • I introduced a requiredLocation() method throughout the stack. Any component that requires location information sets this value to true. Based on this, the call to calcLocation can be made significantly higher up the stack just before the log event is created but only when a configured component will use it - with a few exceptions. This code is not yet checked in.
>>>   • If for some reason the location information isn’t calculated when the event is constructed it can still be lazily populated later using the current, slower mechanism.
>>> 
>>> As you can see these changes result in a pretty dramatic improvement, almost to the point that the cost of obtaining location information doesn’t really matter - especially in Java 11. The results in Java 11 amount to 5us per logging call with no location vs 7.5us per logging call with location info. These results could probably be improved even more if AbstractLogger was “flattened” a bit.
>>> 
>>> I should point out that the logic isn’t perfect. I cannot determine whether location information is required or not for the RoutingAppender since it dynamically creates Appenders at run time. Instead, I added a “requiresLocation” attribute that can be set to indicate that. 
>>> 
>>> However, in testing I have encountered a unit test failure in RandomAccessFileAppenderTest. It is failing because the location information is being included even though includeLocation is set to false. I am wondering if that setting needs to be respected since the pattern specifies including it. It seems odd to me to not include it in that case.
>>> 
>>> Thoughts?
>>> 
>>> Ralph
>>> 
>>> 
>>> -- 
>>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
>> 
> 



Re: Location performance

Posted by Remko Popma <re...@gmail.com>.
If memory serves correctly, the `includeLocation` was introduced because it was necessary for async loggers (and it seemed not feasible to get this information from the layout pattern). 

Async loggers need to know upfront whether they need to pass location information to the consumer thread. 

If obtaining location information becomes so cheap that it’s essentially free, it is fine to ignore the `includeLocation` directive and include location information anyway. 

About the performance results: what’s the difference between “Log4j 3.0 with class and method” and “Log4j 3.0 withLocation()”?

Also I am generally cautious of custom benchmarks; any reason not to use JMH?

Overall impressive results, congratulations!

Remko.

(Shameless plug) Every java main() method deserves http://picocli.info

> On Jul 3, 2019, at 7:26, Ralph Goers <ra...@dslextreme.com> wrote:
> 
> But what do you think about includeLocation. Should it override the result of requiresLocation or should the fact that we determined location information is required override the includeLocation setting?
> 
> Ralph
> 
>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
>> 
>> These benchmarks are impressive! Moving the point of call site calculation is a great idea.
>> 
>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com <ma...@dslextreme.com>> wrote:
>> I have made changes to the code the master branch to measure and improve performance. I have committed some of the changes but not the last set. I have not made the latest changes to release-2.x because they will have to be implemented a bit differently to avoid compatibility issues. The table below summarizes what I am seeing, at least on my MacBook Pro.  This test just basically writes 1.5 million log statements in a loop using an equivalent format for each logging framework. Note that JUL is faster with location information than without primarily because a custom Handler had to be created to format the log record and it performs better than the one used to log without location information.
>> 
>> 
>> 
>> There are several things I should note about this:
>>    • The first step, which is in the PR, has StackLocator.calcLocation modified to use a faster algorithm. Streams are not particularly fast. This resulted in some gains not shown here but can be seen in https://issues.apache.org/jira/browse/LOG4J2-2644 <https://issues.apache.org/jira/browse/LOG4J2-2644>
>>    • Performance is proportional to the number of stack traces that need to be traversed. JUL only had 11 whereas Log4j had 35 before it called calcLocation. Our stack is very deep because AbstractLogger and LoggerConfig have been overly refactored along with having to go though several intermediate objects.
>>    • I introduced a requiredLocation() method throughout the stack. Any component that requires location information sets this value to true. Based on this, the call to calcLocation can be made significantly higher up the stack just before the log event is created but only when a configured component will use it - with a few exceptions. This code is not yet checked in.
>>    • If for some reason the location information isn’t calculated when the event is constructed it can still be lazily populated later using the current, slower mechanism.
>> 
>> As you can see these changes result in a pretty dramatic improvement, almost to the point that the cost of obtaining location information doesn’t really matter - especially in Java 11. The results in Java 11 amount to 5us per logging call with no location vs 7.5us per logging call with location info. These results could probably be improved even more if AbstractLogger was “flattened” a bit.
>> 
>> I should point out that the logic isn’t perfect. I cannot determine whether location information is required or not for the RoutingAppender since it dynamically creates Appenders at run time. Instead, I added a “requiresLocation” attribute that can be set to indicate that. 
>> 
>> However, in testing I have encountered a unit test failure in RandomAccessFileAppenderTest. It is failing because the location information is being included even though includeLocation is set to false. I am wondering if that setting needs to be respected since the pattern specifies including it. It seems odd to me to not include it in that case.
>> 
>> Thoughts?
>> 
>> Ralph
>> 
>> 
>> -- 
>> Matt Sicker <boards@gmail.com <ma...@gmail.com>>
> 

Re: Location performance

Posted by Ralph Goers <ra...@dslextreme.com>.
But what do you think about includeLocation. Should it override the result of requiresLocation or should the fact that we determined location information is required override the includeLocation setting?

Ralph

> On Jul 2, 2019, at 2:34 PM, Matt Sicker <bo...@gmail.com> wrote:
> 
> These benchmarks are impressive! Moving the point of call site calculation is a great idea.
> 
> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ralph.goers@dslextreme.com <ma...@dslextreme.com>> wrote:
> I have made changes to the code the master branch to measure and improve performance. I have committed some of the changes but not the last set. I have not made the latest changes to release-2.x because they will have to be implemented a bit differently to avoid compatibility issues. The table below summarizes what I am seeing, at least on my MacBook Pro.  This test just basically writes 1.5 million log statements in a loop using an equivalent format for each logging framework. Note that JUL is faster with location information than without primarily because a custom Handler had to be created to format the log record and it performs better than the one used to log without location information.
> 
> 
> 
> There are several things I should note about this:
> 	• The first step, which is in the PR, has StackLocator.calcLocation modified to use a faster algorithm. Streams are not particularly fast. This resulted in some gains not shown here but can be seen in https://issues.apache.org/jira/browse/LOG4J2-2644 <https://issues.apache.org/jira/browse/LOG4J2-2644>
> 	• Performance is proportional to the number of stack traces that need to be traversed. JUL only had 11 whereas Log4j had 35 before it called calcLocation. Our stack is very deep because AbstractLogger and LoggerConfig have been overly refactored along with having to go though several intermediate objects.
> 	• I introduced a requiredLocation() method throughout the stack. Any component that requires location information sets this value to true. Based on this, the call to calcLocation can be made significantly higher up the stack just before the log event is created but only when a configured component will use it - with a few exceptions. This code is not yet checked in.
> 	• If for some reason the location information isn’t calculated when the event is constructed it can still be lazily populated later using the current, slower mechanism.
> 
> As you can see these changes result in a pretty dramatic improvement, almost to the point that the cost of obtaining location information doesn’t really matter - especially in Java 11. The results in Java 11 amount to 5us per logging call with no location vs 7.5us per logging call with location info. These results could probably be improved even more if AbstractLogger was “flattened” a bit.
> 
> I should point out that the logic isn’t perfect. I cannot determine whether location information is required or not for the RoutingAppender since it dynamically creates Appenders at run time. Instead, I added a “requiresLocation” attribute that can be set to indicate that. 
> 
> However, in testing I have encountered a unit test failure in RandomAccessFileAppenderTest. It is failing because the location information is being included even though includeLocation is set to false. I am wondering if that setting needs to be respected since the pattern specifies including it. It seems odd to me to not include it in that case.
> 
> Thoughts?
> 
> Ralph
> 
> 
> -- 
> Matt Sicker <boards@gmail.com <ma...@gmail.com>>


Re: Location performance

Posted by Matt Sicker <bo...@gmail.com>.
These benchmarks are impressive! Moving the point of call site calculation
is a great idea.

On Tue, 2 Jul 2019 at 16:22, Ralph Goers <ra...@dslextreme.com> wrote:

> I have made changes to the code the master branch to measure and improve
> performance. I have committed some of the changes but not the last set. I
> have not made the latest changes to release-2.x because they will have to
> be implemented a bit differently to avoid compatibility issues. The table
> below summarizes what I am seeing, at least on my MacBook Pro.  This test
> just basically writes 1.5 million log statements in a loop using an
> equivalent format for each logging framework. Note that JUL is faster with
> location information than without primarily because a custom Handler had to
> be created to format the log record and it performs better than the one
> used to log without location information.
>
>
>
> There are several things I should note about this:
> • The first step, which is in the PR, has StackLocator.calcLocation
> modified to use a faster algorithm. Streams are not particularly fast. This
> resulted in some gains not shown here but can be seen in
> https://issues.apache.org/jira/browse/LOG4J2-2644
> • Performance is proportional to the number of stack traces that need to
> be traversed. JUL only had 11 whereas Log4j had 35 before it called
> calcLocation. Our stack is very deep because AbstractLogger and
> LoggerConfig have been overly refactored along with having to go though
> several intermediate objects.
> • I introduced a requiredLocation() method throughout the stack. Any
> component that requires location information sets this value to true. Based
> on this, the call to calcLocation can be made significantly higher up the
> stack just before the log event is created but only when a configured
> component will use it - with a few exceptions. This code is not yet checked
> in.
> • If for some reason the location information isn’t calculated when the
> event is constructed it can still be lazily populated later using the
> current, slower mechanism.
>
> As you can see these changes result in a pretty dramatic improvement,
> almost to the point that the cost of obtaining location information doesn’t
> really matter - especially in Java 11. The results in Java 11 amount to 5us
> per logging call with no location vs 7.5us per logging call with location
> info. These results could probably be improved even more if AbstractLogger
> was “flattened” a bit.
>
> I should point out that the logic isn’t perfect. I cannot determine
> whether location information is required or not for the RoutingAppender
> since it dynamically creates Appenders at run time. Instead, I added a
> “requiresLocation” attribute that can be set to indicate that.
>
> However, in testing I have encountered a unit test failure in
> RandomAccessFileAppenderTest. It is failing because the location
> information is being included even though includeLocation is set to false.
> I am wondering if that setting needs to be respected since the pattern
> specifies including it. It seems odd to me to not include it in that case.
>
> Thoughts?
>
> Ralph
>


-- 
Matt Sicker <bo...@gmail.com>