You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/02/10 15:33:57 UTC

[GitHub] [accumulo] ctubbsii commented on pull request #1920: Use String.intern() instead of WeakHashMap

ctubbsii commented on pull request #1920:
URL: https://github.com/apache/accumulo/pull/1920#issuecomment-776795695


   > In the [discussion thread](https://lists.apache.org/thread.html/r716826039c63640d7b1d983acc60b8c9116ade3e7f68fac39913144c%40%3Cdev.accumulo.apache.org%3E) about this, the Guava code was cited since they recommend against String.intern. I looked at the git history, and the documentation of that recommendation is only 4 months old, so I started looking around for any performance comparisons, and I couldn't find anything recent.
   
   The comment was added in 2020, yes, but based on the git log, it was added as a clarification that it could also be used for String in addition to other immutable types. The note about performance was added based on a comment from 2010, when this functionality was added to Guava back in v3.0. It does not appear to be based on newer performance information, just a note about the original intent.
   
   The more recent performance information (since 2010) I was looking at were at http://java-performance.info/string-intern-in-java-6-7-8/ (and has the same observations as yours below).
   
   > 
   > I took a gist from a [stackoverflow post](https://stackoverflow.com/questions/10624232/performance-penalty-of-string-intern) discussing String.intern performance, and ran it on JDK 11 on a MacBook Pro laptop. While it's probably not the best benchmark, I found that even in JDK 11, String.intern can be significantly slower than a ConcurrentHashMap. It was mostly bad when I tried with a large number of interned strings. For a small number of interned strings, String.intern performance was comparable to ConcurrentHashMap, though lookup (calling intern on an already interned string) was always slower for String.intern. For large numbers of Strings, I was able to get improvements by setting the `-XX:StringTableSize` argument to the VM to increase the table from its JDK11 default of 65536 (JDK 7-10 defaulted to 60013).
   
   I'm curious how small/large your tests were, and which were more reasonable for Accumulo.
   
   > 
   > I think if you aren't exceeding the string table size, then String.intern performance is probably comparable enough. But if you exceed, then it starts to degrade quickly. I'm not suggesting to skip this change, but it's just something to think about.
   
   Right. That's the idea. The default value of 65536 for JDK11 seems reasonable to me for our usage. If it needs to be higher, based on the user's operating environment, then they can tune it accordingly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org