You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2011/06/21 02:47:24 UTC

DO NOT REPLY [Bug 51400] New: Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

             Bug #: 51400
           Summary: Use of "new String(byte[] b, String enc)" hits Sun JVM
                    bottleneck
           Product: Tomcat 6
           Version: 6.0.32
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Catalina
        AssignedTo: dev@tomcat.apache.org
        ReportedBy: dengberg@evernote.com
    Classification: Unclassified


Created attachment 27186
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27186
Patch with optimizations

We're using Tomcat 6 for a high-volume, high-concurrency service (Evernote). 
At times, we've seen a performance slowdown within the service, which we've
traced to a concurrency flaw within the JVM code that translates named
encodings (e.g. "utf-8") into Charsets.  This translates into a number of stuck
threads trying to convert a byte array to a String or vice versa, ala:

  java.lang.Thread.State: BLOCKED (on object monitor)
       at sun.nio.cs.FastCharsetProvider.charsetForName(Unknown Source)
       - waiting to lock <0x00007ff3b4cc85b0> (a sun.nio.cs.StandardCharsets)
       at java.nio.charset.Charset.lookup2(Unknown Source)
       at java.nio.charset.Charset.lookup(Unknown Source)
       at java.nio.charset.Charset.isSupported(Unknown Source)
       at java.lang.StringCoding.lookupCharset(Unknown Source)
       at java.lang.StringCoding.decode(Unknown Source)
       at java.lang.String.<init>(Unknown Source)
       at
org.apache.tomcat.util.buf.ByteChunk.toStringInternal(ByteChunk.java:499)
       at org.apache.tomcat.util.buf.StringCache.toString(StringCache.java:315)
       at org.apache.tomcat.util.buf.ByteChunk.toString(ByteChunk.java:492)
       at
org.apache.tomcat.util.buf.MessageBytes.toString(MessageBytes.java:213)
       at
org.apache.tomcat.util.http.MimeHeaders.getHeader(MimeHeaders.java:319)
       at org.apache.coyote.Request.getHeader(Request.java:330)
       at org.apache.catalina.connector.Request.getHeader(Request.java:1854)
       at
org.apache.catalina.connector.RequestFacade.getHeader(RequestFacade.java:643)

This isn't a true deadlock, since each thread will eventually finish, but it
can
significantly affect concurrency if there are a number of threads making heavy
use of:
   new String(byte[] b, String encoding)
   String.getBytes()
   String.getBytes(String encoding)

This is, unfortunately, a known bottleneck within the JVM:
http://blog.inuus.com/vox/2008/05/the-mysteries-of-java-character-set-performance.html
http://halfbottle.blogspot.com/2009/07/charset-continued-i-wrote-about.html
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6790402


To avoid this bottleneck in the JVM, we've patched our server to use the
explicit Charset object for String encoding rather than the name of the
charset, and then added a ConcurrentHashMap<String, Charset> to lookup charsets
by encodings.

I've attached a patch with our fixes on 6.0.32

Just as a random FYI - the same issue hits MySQL's Java connector, so we'd
occasionally see Tomcat and MySQL fighting over this same JVM chokepoint: 
http://bugs.mysql.com/bug.php?id=61105

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #6 from Mark Thomas <ma...@apache.org> 2011-06-21 16:13:44 UTC ---
Proposed for 6.0.x

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #27 from Mark Thomas <ma...@apache.org> 2011-06-28 23:26:28 UTC ---
Fixed in 6.0.x and will be included in 6.0.33 onwards.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27186|0                           |1
           is patch|                            |
  Attachment #27186|application/octet-stream    |text/plain
          mime type|                            |

--- Comment #1 from Mark Thomas <ma...@apache.org> 2011-06-21 11:16:15 UTC ---
Comment on attachment 27186
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27186
Patch with optimizations

Mark patch as patch so it can be viewed in the browser.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #14 from Christopher Schultz <ch...@christopherschultz.net> 2011-06-23 20:44:01 UTC ---
Thanks for the patch -- it looks just fine.

If you wanted to, you could even remove the explicit use of SortedMap and just
use "Charset.availableCharsets().entrySet()" as your Iterable.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #30 from Jackie Rosen <ja...@hushmail.com> ---
*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen from the domain http://volichat.com
Page where seen: http://volichat.com/adult-chat-rooms
Marked for reference. Resolved as fixed @bugzilla.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #4 from Mark Thomas <ma...@apache.org> 2011-06-21 15:10:00 UTC ---
Unfortunately, Tomcat 6 implements Servlet 2.5 and that requires Java 1.5
support. That means we can't use any of the String method that use Charset.

http://halfbottle.blogspot.com/2009/07/charset-continued-i-wrote-about.html
does provide a pure Java 1.5 way of doing this.

With this in mind, I am going to go propose a variation of your patch for 6.0.x
that doesn't use any 1.6 methods. Back-porting the 7.0.x patch is too big a
change. I would rather focus on the known issue for 6.0.x.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #13 from Konstantin Preißer <pr...@web.de> 2011-06-23 20:24:24 UTC ---
Created attachment 27202
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27202
Patch to prepopulate available charsets

I made a example of a patch which uses a HashMap to prepopulate all available
Charsets in the current JVM when the class is initialized (I'm sorry if I did
something wrong, I didn't made a patch before).

The patch adds all available charsets and their aliases (by converting them to
lower-case) to the Map. The case normalization solves the DoS danger when a
Client makes requests with lot of different case permutations of the same
charset name, and it allows detection of missing charsets without calling
Charset.forName().

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #8 from Konstantin Kolinko <kn...@gmail.com> 2011-06-22 00:18:40 UTC ---
Maybe we could cache not only found charsets, but charset misses as well.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27212|0                           |1
        is obsolete|                            |
  Attachment #27213|0                           |1
        is obsolete|                            |

--- Comment #22 from Mark Thomas <ma...@apache.org> 2011-06-27 16:10:54 UTC ---
Created attachment 27214
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27214
Patch for tc6.0.x

This just isn't my day ;)

7.0.x is patched.

Updated 6.0.x patch attached.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #9 from Konstantin Preißer <pr...@web.de> 2011-06-23 14:08:49 UTC ---
Hi,

would caching charset misses be a good idea, if the Encoding strings can also
be received from external sources?

For example, if a client makes a POST request to a Servlet and sends this
header: 

Content-Type: application/x-www-form-urlencoded;
charset=this-is-a-non-existing-charset

and a Servlet makes a call to HttpServletRequest.getParameter(...), then
o.a.tomcat.util.buf.B2CConverter.getCharset(String) will be called with a value
of "this-is-a-non-existing-charset". If a client would make tons of requests
with random, invalid charset strings and these misses would be added to a List,
couldn't it lead to a memory leak? (if they would never be deleted)

However, there is static method Charset.availableCharsets() which returns a
SortedMap<String, Charset> of all charsets available by the current JVM. Maybe
this list could be used to build a Map of all available charsets (the aliases
returned by Charset.aliases() would also have to be added)? Then missing
charsets could also be found fast.

However, I think, in B2CConverter.getCharset() the encoding string should be
converted to lower-case/upper-case before a lookup in the Map, to avoid
multiple entries ("uTF-8", "UtF-8" etc.).

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #15 from Mark Thomas <ma...@apache.org> 2011-06-27 13:18:35 UTC ---
The patch doesn't correctly address the DOS concerns, neither does it cache
misses.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #26 from Konstantin Preißer <pr...@web.de> 2011-06-28 18:48:32 UTC ---
Thanks!

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #11 from Konstantin Preißer <pr...@web.de> 2011-06-23 15:25:01 UTC ---
Hi Christopher,

(In reply to comment #10)
> If you read some of the online posts linked from this BZ issue, you'll see
> claims that pre-populating such a cache does not have a noticeable impact on
> performance. Honestly, I'm okay not pre-populating things because there are
> probably a dozen encodings that get any significant amount of real use on the
> web, while Charset.availableCharsets returns 163 different obscure character
> sets.
> 
> I suppose it's a fairly small set of encodings, but with little benefit,
> there's no reason IMO to pre-populate.
You're right; however if I read the reports correctly, this is true if charsets
with valid names only are used for the lookup. But everytime when there is a
loopkup for a non-existing Charset, the JVM-synchronized Charset.lookup() is
called. Probably to speed this up, Konstantin Kolinko suggested to cache
charset missings.

If a list with all avaliable charsets would be pre-populated, including their
aliases, missing charsets could also be determined fast. 

> Actually, I might leave the case in-tact for performance considerations. Yes,
> it's true that utf-8, UTF-8, uTf-8, UTf-8, UtF-8, etc. would all be the same, I
> suspect that only "utf-8" and "UTF-8" will be used in the wild with any
> reasonable frequency. Normalizing case for every lookup is probably a waste of
> time, unless there are significant concerns of DOS using long, non-normalized
> permutations of valid encodings (longest is x-MacCentralEurope with 17
> characters to play with). 17 characters is a lot of permutations (~2MiB),
> though.
Well, on my Windows machine the longest alias (not canonical name) of a charset
is "Extended_UNIX_Code_Packed_Format_for_Japanese" which consists of 39 muatble
characters. The current (trunk) implementation in
o.a.tomcat.util.buf.B2CConverter.getCharset() does not normalize the name, so a
Client could send requests with 2^39 permutations in a Content-Type header
(which would make 49 TiB of Charset strings) ;-)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #18 from Konstantin Preißer <pr...@web.de> 2011-06-27 14:40:57 UTC ---
Hi Mark,

does the patch not correctly address DoS and charset misses because I didn't
use Locale.US? (Nothing would be added to the Map after initialization, so I
can't see a DoS danger there or how it would not cache misses (besides some
"false-positives" with incorrect charset strings in other Locales)).

However, I see that the patch applied to trunk doesn't add the charset aliases
to the Map, which means Charset strings like "utf8" don't work anymore, but
they worked before that change, because Charset.forName or new String(byte[],
String charsetName) allow charset aliases to be used as encoding names. That is
why I also added the aliases to the Map.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27186|0                           |1
        is obsolete|                            |

--- Comment #5 from Mark Thomas <ma...@apache.org> 2011-06-21 16:09:57 UTC ---
Created attachment 27189
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27189
Updated patch that works with Java 1.5

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27214|0                           |1
        is obsolete|                            |

--- Comment #23 from Mark Thomas <ma...@apache.org> 2011-06-28 07:37:05 UTC ---
Created attachment 27219
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27219
Patch for tc6.0.x

Updated patch based on review comments

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #29 from piky <pi...@gmail.com> ---
Comment on attachment 27219
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27219
Patch for tc6.0.x

Help to

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #24 from Konstantin Preißer <pr...@web.de> 2011-06-28 12:45:26 UTC ---
Hi Mark,
did you see my point about C2BConverter? I can see that you marked that patch
as obsolete (it was for C2BConverter, not B2CConverter), but in trunk,
C2BConverter still uses the Encoding String instead of a Charset to create a
new IntermediateOutputStream (but I forgot to remove "throws
UnsupportedEncodingException" in the patch).

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #28 from Ismael Juma <ml...@juma.me.uk> 2011-07-19 15:56:24 UTC ---
Note that there have been changes in this area in Java 7:

"Using Charset as the parameter does not bring you any performance benefit (in
fact it's slower and bloat) in most use scenarios, use it with caution."
http://blogs.oracle.com/xuemingshen/entry/faster_new_string_bytes_cs

The blog above talks about single-byte encodings, but there has been a
follow-up commit for UTF-8 earlier this year.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27211|0                           |1
        is obsolete|                            |

--- Comment #20 from Mark Thomas <ma...@apache.org> 2011-06-27 15:21:10 UTC ---
Created attachment 27212
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27212
Patch for tc6.0.x

Third attempt

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #7 from Konstantin Preißer <pr...@web.de> 2011-06-21 19:25:02 UTC ---
Hi,

could/should this also be fixed for calls to new
OutputStreamWriter(OutputStream out, String charsetName)?

For example, org.apache.tomcat.util.buf.WriteConvertor (in C2BConverter.java)
which seems to be used for ServletResponse.getWriter() calls the constructor of
OutputStreamWriter with charsetName as String, and in the Sun Implementation
this calls Charset.forName() / Charset.lookup().

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #19 from Mark Thomas <ma...@apache.org> 2011-06-27 14:46:55 UTC ---
(In reply to comment #18)
> Hi Mark,
> 
> does the patch not correctly address DoS and charset misses because I didn't
> use Locale.US?

My bad. I mis-read the patch (too  much time working with Eclipse where the old
and new versions are the other way around). Your patch is fine.

> However, I see that the patch applied to trunk doesn't add the charset aliases
> to the Map, which means Charset strings like "utf8" don't work anymore, but
> they worked before that change, because Charset.forName or new String(byte[],
> String charsetName) allow charset aliases to be used as encoding names. That is
> why I also added the aliases to the Map.

Yep - I missed that too. I'll add that shortly.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #16 from Mark Thomas <ma...@apache.org> 2011-06-27 13:51:46 UTC ---
I've updated the 7.0.x code to:
- address the DOS concerns
- pre-populate the cache
- ensure cache misses are efficient

I'll create a patch for 6.0.x shortly.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #12 from Christopher Schultz <ch...@christopherschultz.net> 2011-06-23 20:02:11 UTC ---
> > I suppose it's a fairly small set of encodings, but with little benefit,
> > there's no reason IMO to pre-populate.
>
> You're right; however if I read the reports correctly, this is true if charsets
> with valid names only are used for the lookup. But everytime when there is a
> loopkup for a non-existing Charset, the JVM-synchronized Charset.lookup() is
> called. Probably to speed this up, Konstantin Kolinko suggested to cache
> charset missings.

Duh. I hadn't thought of spurious lookups causing their own synchronization
disasters.

Perhaps the invalid-charset cache could be limited in some way: MRU caches are
easy to build with the standard Java library.

> If a list with all avaliable charsets would be pre-populated, including their
> aliases, missing charsets could also be determined fast. 

True: if the encoding is not supported by the JVM, then it's invalid no matter
what. In that case, case normalization is probably a good thing to do: if it's
not in the case (after normalization), then it's not valid... no reason to ever
call Charset.lookup() after startup.

> Well, on my Windows machine the longest alias (not canonical name) of a charset
> is "Extended_UNIX_Code_Packed_Format_for_Japanese" which consists of 39 mutable
> characters.

Wow.

> The current (trunk) implementation in
> o.a.tomcat.util.buf.B2CConverter.getCharset() does not normalize the name, so a
> Client could send requests with 2^39 permutations in a Content-Type header
> (which would make 49 TiB of Charset strings) ;-)

My math might be wrong, too, but I believe that's only 512GiB if names are
1-byte-per-char, but I think Java does 2-bytes-per-char, so it's 1TiB.

You're right, though: that's pretty huge.

+1 to case normalization.
+1 to LUT pre-population.
-1 to LUT miss caching: it's totally unnecessary given the above.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #25 from Mark Thomas <ma...@apache.org> 2011-06-28 16:07:11 UTC ---
C2BConvertor patched with a slightly modified patch for 7.0.x. I don't propose
back-porting this to 6.0.x

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27189|0                           |1
        is obsolete|                            |
  Attachment #27202|0                           |1
        is obsolete|                            |

--- Comment #17 from Mark Thomas <ma...@apache.org> 2011-06-27 14:07:00 UTC ---
Created attachment 27211
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27211
Updated patch

Updated patch that works with Java 5, pre-populates the cache and handles the
DOS concerns. This is still focussed on the areas reported by the OP where
issues were observed.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

piky <pi...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pikymiky9114@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #3 from Konstantin Preißer <pr...@web.de> 2011-06-21 15:05:53 UTC ---
Hello,
please note that String.getBytes(Charset charset), new String(byte[] bytes,
Charset charset) etc. are available since Java 1.6 only, so these probably
can't be used for Tomcat 6.

According to the second link in the report, Charset.decode() / Charset.encode()
(with ByteBuffer/CharBuffer) can be used for Strings.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #2 from Mark Thomas <ma...@apache.org> 2011-06-21 14:32:27 UTC ---
Thanks for the report and the patch.

I've gone through the 7.0.x codebase and changed almost all of the calls to:
   new String(byte[] b, String encoding)
   String.getBytes()
   String.getBytes(String encoding)
to use a Charset. That fix will be in 7.0.17 onwards.

The 7.0.x patch is a lot larger than the patch you propose for 6.0.x. I'm
currently considering what to propose for 6.0.x.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #21 from Konstantin Preißer <pr...@web.de> 2011-06-27 16:06:52 UTC ---
Created attachment 27213
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27213
Small patch to update C2BConverter

Hi Mark,

(In reply to comment #19)
> My bad. I mis-read the patch (too  much time working with Eclipse where the old
> and new versions are the other way around). Your patch is fine.

> Yep - I missed that too. I'll add that shortly.

Thanks!

Well, I don't mean to be annoying or something, but.. It seems that you forgot
to convert the aliases to lower-case before adding them to the Map, which also
would cause "utf8" not to work. ;-)

May I suggest to also update o.a.tomcat.util.buf.C2BConverter to use a Charset
(because it seems this is used when calling HttpServletResponse.getWriter())?
Please see attached (small) patch.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #10 from Christopher Schultz <ch...@christopherschultz.net> 2011-06-23 14:47:42 UTC ---
> would caching charset misses be a good idea, if the Encoding strings can also
> be received from external sources?

+1 to Konstantin Preißer's DOS concerns.

> However, there is static method Charset.availableCharsets() which returns a
> SortedMap<String, Charset> of all charsets available by the current JVM. Maybe
> this list could be used to build a Map of all available charsets (the aliases
> returned by Charset.aliases() would also have to be added)? Then missing
> charsets could also be found fast.

If you read some of the online posts linked from this BZ issue, you'll see
claims that pre-populating such a cache does not have a noticeable impact on
performance. Honestly, I'm okay not pre-populating things because there are
probably a dozen encodings that get any significant amount of real use on the
web, while Charset.availableCharsets returns 163 different obscure character
sets.

I suppose it's a fairly small set of encodings, but with little benefit,
there's no reason IMO to pre-populate.

> However, I think, in B2CConverter.getCharset() the encoding string should be
> converted to lower-case/upper-case before a lookup in the Map, to avoid
> multiple entries ("uTF-8", "UtF-8" etc.).

Actually, I might leave the case in-tact for performance considerations. Yes,
it's true that utf-8, UTF-8, uTf-8, UTf-8, UtF-8, etc. would all be the same, I
suspect that only "utf-8" and "UTF-8" will be used in the wild with any
reasonable frequency. Normalizing case for every lookup is probably a waste of
time, unless there are significant concerns of DOS using long, non-normalized
permutations of valid encodings (longest is x-MacCentralEurope with 17
characters to play with). 17 characters is a lot of permutations (~2MiB),
though.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org