You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2015/04/13 18:59:51 UTC

[Bug 57808] New: Don't preload all charsets

https://bz.apache.org/bugzilla/show_bug.cgi?id=57808

            Bug ID: 57808
           Summary: Don't preload all charsets
           Product: Tomcat 8
           Version: trunk
          Hardware: PC
                OS: Mac OS X 10.1
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Util
          Assignee: dev@tomcat.apache.org
          Reporter: kustos@gmx.net

Created attachment 32646
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32646&action=edit
simple patch

I analyzed several heap dumps and and for small applications it seems that
Tomcat uses about twice as much memory as Jetty (roughly 20MB vs 10MB). Most of
this seems to come from Charset instances. While the number of Charsets is low
some of them (mostly EBCDIC and GB18030) have quite large coding tables.

The attached patch only preloads UTF-8 and ISO-8859-1, this saves about 5MB.
That doesn't sound drastic but makes Tomcat to go from about 17.5MB to 12.5MB.

Due to the discussion in 51400 I decided not to cache look up failures.

I decided to use a ConcurrentHashMap so that readers never block. In theory a
fully synchronized map could also be used as look up times should be low and
therefore the lock should be uncontended most of the time.

See the attached heap dumps for applications before and after the patch.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 57808] Don't preload all charsets

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57808

--- Comment #7 from Fredrik Jonson <fr...@jonson.org> ---
(In reply to Mark Thomas from comment #6)

> It means you don't have to cache the misses since misses have their own DoS
> potential.

Is it correct that there are two possible DOS attacks with dynamic charset
loading, i.e with the patch applied:

 1. force permanent increased memory during runtime with requests that specify
    all avaliable but previously unloaded charsets.

 2. force expensive charset name/instance lookup misses with requests
    that specify non-existent charset names.

Any other?

The consequence of threat 1 is limited. It is self-exhausting as soon as all
available charsets have been loaded once. One mitigation could be to use weak
references, if it's worth it?

Threat 2 could be mitigated if charset loading is performed only when the
charset name is valid. Preloading all available charset names on startup will
block threat 2, right? Assuming of course that prefetching all charset names
doesn't nullify the memory reduction.

BTW, assuming the proposal is accepted, I'd like to see dynamic loading as the
new default, and if necessary an option to make tomcat fall back to the old
pre-loading behaviour. Correct and lean should be the default, not an option.

(N.B I'm not a tomcat committer, just a interested user.)

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 57808] Don't preload all charsets

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57808

--- Comment #10 from Konstantin Kolinko <kn...@gmail.com> ---
FYI:
A new enhancement request - bug 58859 "Allow to limit charsets / encodings
supported by Tomcat" - offers a similar feature, but with different
requirements and background.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 57808] Don't preload all charsets

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57808

--- Comment #6 from Mark Thomas <ma...@apache.org> ---
(In reply to Philippe Marschall from comment #4)
> I have read bug 51400 and my understanding is that preloading is not
> necessary to avoid performance bottlenecks. Preloading was discussed but I
> couldn't find the reasons why preloading was ultimately chosen.

It means you don't have to cache the misses since misses have their own DoS
potential.

I wonder if lazy-loading the cache with ISO-8859-1 and UTF-8 pre-loaded would
give a sensible balance without adding another configuration option?

I'd feel more comfortable in these discussions with some numbers to quantify
the performance impact.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 57808] Don't preload all charsets

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57808

Philippe Marschall <ku...@gmx.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #32646|0                           |1
        is obsolete|                            |

--- Comment #5 from Philippe Marschall <ku...@gmx.net> ---
Created attachment 32662
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32662&action=edit
2nd try at a patch

Updated patch:
- keeps the current behavior as default
- new behavior is available and configurable through a system property
- document the system property

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 57808] Don't preload all charsets

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57808

--- Comment #3 from Konstantin Kolinko <kn...@gmail.com> ---
This preloading is necessary to avoid performance bottlenecks. (See history of
that code, or search Bugzilla  -> bug 51400).

As such, it cannot be removed.

If one wants, it shall be possible to provide a user-configurable list of
charsets as an opt-in feature (the same as suggested in comment 2), but Tomcat
will behave as all the other charsets do not exist.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 57808] Don't preload all charsets

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57808

--- Comment #4 from Philippe Marschall <ku...@gmx.net> ---
I have read bug 51400 and my understanding is that preloading is not necessary
to avoid performance bottlenecks. Preloading was discussed but I couldn't find
the reasons why preloading was ultimately chosen. The reasons for other
implementation decisions I could find (not caching misses, not storing
permutations, …).
My understanding is that avoiding calling (indirectly) Charset.forName() /
Charset.lookup() during normal request processing is necessary to avoid
performance bottlenecks. This is achieved by the caching even when avoiding the
preloading.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 57808] Don't preload all charsets

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57808

--- Comment #8 from Mark Thomas <ma...@apache.org> ---
(In reply to Fredrik Jonson from comment #7)
> (In reply to Mark Thomas from comment #6)
> 
> > It means you don't have to cache the misses since misses have their own DoS
> > potential.
> 
> Is it correct that there are two possible DOS attacks with dynamic charset
> loading, i.e with the patch applied:
> 
>  1. force permanent increased memory during runtime with requests that
> specify
>     all avaliable but previously unloaded charsets.
> 
>  2. force expensive charset name/instance lookup misses with requests
>     that specify non-existent charset names.

Just to nit pick, the expensive lookups are (at least were when this code was
written) for any Charset (valid or invalid) that are not in the cache.

> Any other?
> 
> The consequence of threat 1 is limited. It is self-exhausting as soon as all
> available charsets have been loaded once.

Agreed.

> One mitigation could be to use weak references, if it's worth it?

I don't think so.

> Threat 2 could be mitigated if charset loading is performed only when the
> charset name is valid. Preloading all available charset names on startup
> will block threat 2, right?

Correct.

> Assuming of course that prefetching all charset
> names doesn't nullify the memory reduction.

I think it does. At keast the way we get the names of the valid charsets
currently it does,

> BTW, assuming the proposal is accepted, I'd like to see dynamic loading as
> the new default, and if necessary an option to make tomcat fall back to the
> old pre-loading behaviour. Correct and lean should be the default, not an
> option.

No objections there.

> (N.B I'm not a tomcat committer, just a interested user.)

A few good patches can soon change that ;)

The discussion in bug 51400 suggests that the performance characteristics we
were trying to avoid have changed in Java 7 (i.e. Tomcat 8 onwards). That makes
it more important we have some good test results on which to base any decision.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 57808] Don't preload all charsets

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57808

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |WONTFIX

--- Comment #9 from Mark Thomas <ma...@apache.org> ---
I've been doing some further investigations and I suspect that the results are
highly dependent on the JVM version used.

With recent Java 8 releases the memory retained by this cache is relatively low
(around 100k) and while you can reduce that by not preloading, the gains you
make are lost by what you have to add to account for lazy-loading while
retaining the performance benefits.

Given that there is not a compelling case for changing the current behaviour, I
am resolving this as WONTFIX.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 57808] Don't preload all charsets

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57808

Christopher Schultz <ch...@christopherschultz.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |PatchAvailable

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 57808] Don't preload all charsets

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57808

Fredrik Jonson <fr...@jonson.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fredrik@jonson.org

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 57808] Don't preload all charsets

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57808

--- Comment #1 from Philippe Marschall <ku...@gmx.net> ---
The heap dumps would be around 7MB compressed so I can't attach them. Where
should I post them?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 57808] Don't preload all charsets

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57808

Remy Maucherat <re...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

--- Comment #2 from Remy Maucherat <re...@apache.org> ---
In most cases, 5MB would be a trivial amount of memory, so how about using a
flag with a system property for that (or something similar) ?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org