You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4j-dev@logging.apache.org by Mikael Ståldal <mi...@magine.com> on 2016/05/18 15:29:16 UTC

Garbage-free string encoding performance with UTF-16 charset

It seems like the new garbage-free string encoding method performs poorly
with the UTF-16 charset.

See AbstractStringLayoutStringEncodingBenchmark in log4j-perf which I just
committed to master branch.

My results, note utf16Encode:

Benchmark              Mode  Samples     Score    Error  Units
baseline                   sample    90395     24.754 ±  0.484  ns/op
iso8859_1Encode    sample    54514   130.176 ±  2.320  ns/op
iso8859_1GetBytes sample    64464    126.122 ±  1.184  ns/op
usAsciiEncode         sample    68833    190.550 ±  1.117  ns/op
usAsciiGetBytes       sample    80176   170.556 ±  1.691  ns/op
utf16Encode             sample    86597  2013.954 ± 10.551  ns/op
utf16GetBytes          sample    63696    386.276 ± 46.024  ns/op
utf8Encode               sample    69108    190.773 ±  1.504  ns/op
utf8GetBytes            sample    66561    196.247 ±  1.623  ns/op

-- 
[image: MagineTV]

*Mikael Ståldal*
Senior software developer

*Magine TV*
mikael.staldal@magine.com
Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com

Privileged and/or Confidential Information may be contained in this
message. If you are not the addressee indicated in this message
(or responsible for delivery of the message to such a person), you may not
copy or deliver this message to anyone. In such case,
you should destroy this message and kindly notify the sender by reply
email.

Re: Garbage-free string encoding performance with UTF-16 charset

Posted by Gary Gregory <ga...@gmail.com>.
Ditto, I've only seen UTF-16 used for XML documents. All it takes is one
customer though ;-) I do not think we need to hold up a release for this
though.

Gary

On Wed, May 18, 2016 at 9:10 AM, Matt Sicker <bo...@gmail.com> wrote:

> I used UTF-16 to encode an XML file by accident once. That's about the
> extent that I've ever used it.
>
> On 18 May 2016 at 11:08, Mikael Ståldal <mi...@magine.com> wrote:
>
>> Maybe not, if we assume that most users won't use UTF-16.
>>
>> (I don't use UTF-16, and I don't know any specific use case for it. I
>> just thought it would be good to test it.)
>>
>> There is no significant difference for US-ASCII, ISO-8859-1 and UTF-8.
>>
>> On Wed, May 18, 2016 at 6:02 PM, Remko Popma <re...@gmail.com>
>> wrote:
>>
>>> Interesting. I'll take a look tomorrow.
>>> I don't think this is a showstopper though, would you agree?
>>>
>>> On Thu, May 19, 2016 at 12:29 AM, Mikael Ståldal <
>>> mikael.staldal@magine.com> wrote:
>>>
>>>> It seems like the new garbage-free string encoding method performs
>>>> poorly with the UTF-16 charset.
>>>>
>>>> See AbstractStringLayoutStringEncodingBenchmark in log4j-perf which I
>>>> just committed to master branch.
>>>>
>>>> My results, note utf16Encode:
>>>>
>>>> Benchmark              Mode  Samples     Score    Error  Units
>>>> baseline                   sample    90395     24.754 ±  0.484  ns/op
>>>> iso8859_1Encode    sample    54514   130.176 ±  2.320  ns/op
>>>> iso8859_1GetBytes sample    64464    126.122 ±  1.184  ns/op
>>>> usAsciiEncode         sample    68833    190.550 ±  1.117  ns/op
>>>> usAsciiGetBytes       sample    80176   170.556 ±  1.691  ns/op
>>>> utf16Encode             sample    86597  2013.954 ± 10.551  ns/op
>>>> utf16GetBytes          sample    63696    386.276 ± 46.024  ns/op
>>>> utf8Encode               sample    69108    190.773 ±  1.504  ns/op
>>>> utf8GetBytes            sample    66561    196.247 ±  1.623  ns/op
>>>>
>>>> --
>>>> [image: MagineTV]
>>>>
>>>> *Mikael Ståldal*
>>>> Senior software developer
>>>>
>>>> *Magine TV*
>>>> mikael.staldal@magine.com
>>>> Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com
>>>>
>>>> Privileged and/or Confidential Information may be contained in this
>>>> message. If you are not the addressee indicated in this message
>>>> (or responsible for delivery of the message to such a person), you may
>>>> not copy or deliver this message to anyone. In such case,
>>>> you should destroy this message and kindly notify the sender by reply
>>>> email.
>>>>
>>>
>>>
>>
>>
>> --
>> [image: MagineTV]
>>
>> *Mikael Ståldal*
>> Senior software developer
>>
>> *Magine TV*
>> mikael.staldal@magine.com
>> Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com
>>
>> Privileged and/or Confidential Information may be contained in this
>> message. If you are not the addressee indicated in this message
>> (or responsible for delivery of the message to such a person), you may
>> not copy or deliver this message to anyone. In such case,
>> you should destroy this message and kindly notify the sender by reply
>> email.
>>
>
>
>
> --
> Matt Sicker <bo...@gmail.com>
>



-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: Garbage-free string encoding performance with UTF-16 charset

Posted by Matt Sicker <bo...@gmail.com>.
I used UTF-16 to encode an XML file by accident once. That's about the
extent that I've ever used it.

On 18 May 2016 at 11:08, Mikael Ståldal <mi...@magine.com> wrote:

> Maybe not, if we assume that most users won't use UTF-16.
>
> (I don't use UTF-16, and I don't know any specific use case for it. I just
> thought it would be good to test it.)
>
> There is no significant difference for US-ASCII, ISO-8859-1 and UTF-8.
>
> On Wed, May 18, 2016 at 6:02 PM, Remko Popma <re...@gmail.com>
> wrote:
>
>> Interesting. I'll take a look tomorrow.
>> I don't think this is a showstopper though, would you agree?
>>
>> On Thu, May 19, 2016 at 12:29 AM, Mikael Ståldal <
>> mikael.staldal@magine.com> wrote:
>>
>>> It seems like the new garbage-free string encoding method performs
>>> poorly with the UTF-16 charset.
>>>
>>> See AbstractStringLayoutStringEncodingBenchmark in log4j-perf which I
>>> just committed to master branch.
>>>
>>> My results, note utf16Encode:
>>>
>>> Benchmark              Mode  Samples     Score    Error  Units
>>> baseline                   sample    90395     24.754 ±  0.484  ns/op
>>> iso8859_1Encode    sample    54514   130.176 ±  2.320  ns/op
>>> iso8859_1GetBytes sample    64464    126.122 ±  1.184  ns/op
>>> usAsciiEncode         sample    68833    190.550 ±  1.117  ns/op
>>> usAsciiGetBytes       sample    80176   170.556 ±  1.691  ns/op
>>> utf16Encode             sample    86597  2013.954 ± 10.551  ns/op
>>> utf16GetBytes          sample    63696    386.276 ± 46.024  ns/op
>>> utf8Encode               sample    69108    190.773 ±  1.504  ns/op
>>> utf8GetBytes            sample    66561    196.247 ±  1.623  ns/op
>>>
>>> --
>>> [image: MagineTV]
>>>
>>> *Mikael Ståldal*
>>> Senior software developer
>>>
>>> *Magine TV*
>>> mikael.staldal@magine.com
>>> Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com
>>>
>>> Privileged and/or Confidential Information may be contained in this
>>> message. If you are not the addressee indicated in this message
>>> (or responsible for delivery of the message to such a person), you may
>>> not copy or deliver this message to anyone. In such case,
>>> you should destroy this message and kindly notify the sender by reply
>>> email.
>>>
>>
>>
>
>
> --
> [image: MagineTV]
>
> *Mikael Ståldal*
> Senior software developer
>
> *Magine TV*
> mikael.staldal@magine.com
> Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com
>
> Privileged and/or Confidential Information may be contained in this
> message. If you are not the addressee indicated in this message
> (or responsible for delivery of the message to such a person), you may not
> copy or deliver this message to anyone. In such case,
> you should destroy this message and kindly notify the sender by reply
> email.
>



-- 
Matt Sicker <bo...@gmail.com>

Re: Garbage-free string encoding performance with UTF-16 charset

Posted by Mikael Ståldal <mi...@magine.com>.
Maybe not, if we assume that most users won't use UTF-16.

(I don't use UTF-16, and I don't know any specific use case for it. I just
thought it would be good to test it.)

There is no significant difference for US-ASCII, ISO-8859-1 and UTF-8.

On Wed, May 18, 2016 at 6:02 PM, Remko Popma <re...@gmail.com> wrote:

> Interesting. I'll take a look tomorrow.
> I don't think this is a showstopper though, would you agree?
>
> On Thu, May 19, 2016 at 12:29 AM, Mikael Ståldal <
> mikael.staldal@magine.com> wrote:
>
>> It seems like the new garbage-free string encoding method performs poorly
>> with the UTF-16 charset.
>>
>> See AbstractStringLayoutStringEncodingBenchmark in log4j-perf which I
>> just committed to master branch.
>>
>> My results, note utf16Encode:
>>
>> Benchmark              Mode  Samples     Score    Error  Units
>> baseline                   sample    90395     24.754 ±  0.484  ns/op
>> iso8859_1Encode    sample    54514   130.176 ±  2.320  ns/op
>> iso8859_1GetBytes sample    64464    126.122 ±  1.184  ns/op
>> usAsciiEncode         sample    68833    190.550 ±  1.117  ns/op
>> usAsciiGetBytes       sample    80176   170.556 ±  1.691  ns/op
>> utf16Encode             sample    86597  2013.954 ± 10.551  ns/op
>> utf16GetBytes          sample    63696    386.276 ± 46.024  ns/op
>> utf8Encode               sample    69108    190.773 ±  1.504  ns/op
>> utf8GetBytes            sample    66561    196.247 ±  1.623  ns/op
>>
>> --
>> [image: MagineTV]
>>
>> *Mikael Ståldal*
>> Senior software developer
>>
>> *Magine TV*
>> mikael.staldal@magine.com
>> Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com
>>
>> Privileged and/or Confidential Information may be contained in this
>> message. If you are not the addressee indicated in this message
>> (or responsible for delivery of the message to such a person), you may
>> not copy or deliver this message to anyone. In such case,
>> you should destroy this message and kindly notify the sender by reply
>> email.
>>
>
>


-- 
[image: MagineTV]

*Mikael Ståldal*
Senior software developer

*Magine TV*
mikael.staldal@magine.com
Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com

Privileged and/or Confidential Information may be contained in this
message. If you are not the addressee indicated in this message
(or responsible for delivery of the message to such a person), you may not
copy or deliver this message to anyone. In such case,
you should destroy this message and kindly notify the sender by reply
email.

Re: Garbage-free string encoding performance with UTF-16 charset

Posted by Remko Popma <re...@gmail.com>.
Interesting. I'll take a look tomorrow.
I don't think this is a showstopper though, would you agree?

On Thu, May 19, 2016 at 12:29 AM, Mikael Ståldal <mi...@magine.com>
wrote:

> It seems like the new garbage-free string encoding method performs poorly
> with the UTF-16 charset.
>
> See AbstractStringLayoutStringEncodingBenchmark in log4j-perf which I
> just committed to master branch.
>
> My results, note utf16Encode:
>
> Benchmark              Mode  Samples     Score    Error  Units
> baseline                   sample    90395     24.754 ±  0.484  ns/op
> iso8859_1Encode    sample    54514   130.176 ±  2.320  ns/op
> iso8859_1GetBytes sample    64464    126.122 ±  1.184  ns/op
> usAsciiEncode         sample    68833    190.550 ±  1.117  ns/op
> usAsciiGetBytes       sample    80176   170.556 ±  1.691  ns/op
> utf16Encode             sample    86597  2013.954 ± 10.551  ns/op
> utf16GetBytes          sample    63696    386.276 ± 46.024  ns/op
> utf8Encode               sample    69108    190.773 ±  1.504  ns/op
> utf8GetBytes            sample    66561    196.247 ±  1.623  ns/op
>
> --
> [image: MagineTV]
>
> *Mikael Ståldal*
> Senior software developer
>
> *Magine TV*
> mikael.staldal@magine.com
> Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com
>
> Privileged and/or Confidential Information may be contained in this
> message. If you are not the addressee indicated in this message
> (or responsible for delivery of the message to such a person), you may not
> copy or deliver this message to anyone. In such case,
> you should destroy this message and kindly notify the sender by reply
> email.
>