You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Rene S (JIRA)" <ji...@apache.org> on 2008/06/17 22:52:44 UTC

[jira] Created: (LUCENE-1308) Remove String.intern() from Field.java to increase performance and lower contention

Remove String.intern() from Field.java to increase performance and lower contention
-----------------------------------------------------------------------------------

                 Key: LUCENE-1308
                 URL: https://issues.apache.org/jira/browse/LUCENE-1308
             Project: Lucene - Java
          Issue Type: Improvement
    Affects Versions: 2.3.2
            Reporter: Rene S


Right now, *document.Field is interning all field names. While this makes sense because it lowers the overall memory consumption, the method intern() of String is know to be difficult to handle. 

1) it is a native call and therefore slower than anything on the Java level
2) the String pool is part of the perm space and not of the general heap, so it's size is more restricted and needs extra VM params to be managed
3) Some VMs show GC problems with strings in the string pool

Suggested solution is a WeakHashMap instead, that takes care of unifying the String instances and at the same time keeping the pool in the heap space and releasing the String when it is not longer needed. For extra performance in a concurrent environment, a ConcurrentHashMap-like implementation of a weak hashmap is recommended, because we mostly read from the pool.

We saw a 10% improvement in throughout and response time of our application and the application is not only doing searches (we read a lot of documents from the result). So a single measurement test case could show even more improvement in single and concurrent usage.

****** The Cache:
  /** Cache to replace the expensive String.intern() call with the java version */
  private final static Map<String, WeakReference<String>> unifiedStringsCache =
  	Collections.synchronizedMap(new WeakHashMap<String, WeakReference<String>>(109));

****** The access to it, instead of this.name = name.intern;

// unify the strings, but do not use the expensive String.intern() version
// which is not "weak enough", uses the perm space and is a native call
String unifiedName = null;

WeakReference<String> ref = unifiedStringsCache.get(name);
if (ref != null)
{
    unifiedName = ref.get();
}
if (unifiedName == null)
{
    unifiedStringsCache.put(name, new WeakReference(name));
    unifiedName = name;
}
this.name = unifiedName;

I guess it is sufficient to have mostly all fields names interned, so I skipped the additional synchronization around the access and take the risk that only 99.99% :) of all field names are interned.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1308) Remove String.intern() from Field.java to increase performance and lower contention

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605934#action_12605934 ] 

Yonik Seeley commented on LUCENE-1308:
--------------------------------------

Removing intern() is perhaps something we could look at for Lucene 3
Here's an older thread on the subject:
http://markmail.org/message/5m245gahezc3pet7

> Remove String.intern() from Field.java to increase performance and lower contention
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1308
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1308
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.3.2
>            Reporter: Rene S
>
> Right now, *document.Field is interning all field names. While this makes sense because it lowers the overall memory consumption, the method intern() of String is know to be difficult to handle. 
> 1) it is a native call and therefore slower than anything on the Java level
> 2) the String pool is part of the perm space and not of the general heap, so it's size is more restricted and needs extra VM params to be managed
> 3) Some VMs show GC problems with strings in the string pool
> Suggested solution is a WeakHashMap instead, that takes care of unifying the String instances and at the same time keeping the pool in the heap space and releasing the String when it is not longer needed. For extra performance in a concurrent environment, a ConcurrentHashMap-like implementation of a weak hashmap is recommended, because we mostly read from the pool.
> We saw a 10% improvement in throughout and response time of our application and the application is not only doing searches (we read a lot of documents from the result). So a single measurement test case could show even more improvement in single and concurrent usage.
> The Cache:
> /** Cache to replace the expensive String.intern() call with the java version */
> private final static Map<String, WeakReference<String>> unifiedStringsCache =
>    Collections.synchronizedMap(new WeakHashMap<String, WeakReference<String>>(109));
> The access to it, instead of this.name = name.intern;
> // unify the strings, but do not use the expensive String.intern() version
> // which is not "weak enough", uses the perm space and is a native call
> String unifiedName = null;
> WeakReference<String> ref = unifiedStringsCache.get(name);
> if (ref != null)
> {
>     unifiedName = ref.get();
> }
> if (unifiedName == null)
> {
>     unifiedStringsCache.put(name, new WeakReference(name));
>     unifiedName = name;
> }
> this.name = unifiedName;
> I guess it is sufficient to have mostly all fields names interned, so I skipped the additional synchronization around the access and take the risk that only 99.99% :) of all field names are interned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1308) Remove String.intern() from Field.java to increase performance and lower contention

Posted by "Rene Schwietzke (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rene Schwietzke updated LUCENE-1308:
------------------------------------

    Attachment: yad.zip

Test source, code, and data files. Contains a Field.java that can be modified to try the different methods of interning Strings.

> Remove String.intern() from Field.java to increase performance and lower contention
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1308
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1308
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.3.2
>            Reporter: Rene Schwietzke
>         Attachments: yad.zip
>
>
> Right now, *document.Field is interning all field names. While this makes sense because it lowers the overall memory consumption, the method intern() of String is know to be difficult to handle. 
> 1) it is a native call and therefore slower than anything on the Java level
> 2) the String pool is part of the perm space and not of the general heap, so it's size is more restricted and needs extra VM params to be managed
> 3) Some VMs show GC problems with strings in the string pool
> Suggested solution is a WeakHashMap instead, that takes care of unifying the String instances and at the same time keeping the pool in the heap space and releasing the String when it is not longer needed. For extra performance in a concurrent environment, a ConcurrentHashMap-like implementation of a weak hashmap is recommended, because we mostly read from the pool.
> We saw a 10% improvement in throughout and response time of our application and the application is not only doing searches (we read a lot of documents from the result). So a single measurement test case could show even more improvement in single and concurrent usage.
> The Cache:
> /** Cache to replace the expensive String.intern() call with the java version */
> private final static Map<String, WeakReference<String>> unifiedStringsCache =
>    Collections.synchronizedMap(new WeakHashMap<String, WeakReference<String>>(109));
> The access to it, instead of this.name = name.intern;
> // unify the strings, but do not use the expensive String.intern() version
> // which is not "weak enough", uses the perm space and is a native call
> String unifiedName = null;
> WeakReference<String> ref = unifiedStringsCache.get(name);
> if (ref != null)
> {
>     unifiedName = ref.get();
> }
> if (unifiedName == null)
> {
>     unifiedStringsCache.put(name, new WeakReference(name));
>     unifiedName = name;
> }
> this.name = unifiedName;
> I guess it is sufficient to have mostly all fields names interned, so I skipped the additional synchronization around the access and take the risk that only 99.99% :) of all field names are interned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1308) Remove String.intern() from Field.java to increase performance and lower contention

Posted by "Rene Schwietzke (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606052#action_12606052 ] 

Rene Schwietzke commented on LUCENE-1308:
-----------------------------------------

A wrote a small test case that runs a single thread search, as well as a multithreaded search using the same indexsearcher. Especially when running in a threaded context, the replacement of String.intern() pays off. Even the single thread is faster.

I measured the following numbers:

String.Intern, Single Searcher
[main] Search  took: 3453ms
[Thread-2] Search  took: 17812ms
[Thread-3] Search  took: 18313ms
[Thread-1] Search  took: 18234ms
[Thread-0] Search  took: 18562ms

WeakHashMap, Single Searcher
[main] Search  took: 3156ms
[Thread-3] Search  took: 14953ms
[Thread-1] Search  took: 15593ms
[Thread-0] Search  took: 15656ms
[Thread-2] Search  took: 16188ms

ConcurrentHashMap, Single Searcher
[main] Search  took: 2844ms
[Thread-1] Search  took: 14812ms
[Thread-0] Search  took: 14890ms
[Thread-2] Search  took: 15172ms
[Thread-3] Search  took: 14656ms

> Remove String.intern() from Field.java to increase performance and lower contention
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1308
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1308
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.3.2
>            Reporter: Rene Schwietzke
>         Attachments: yad.zip
>
>
> Right now, *document.Field is interning all field names. While this makes sense because it lowers the overall memory consumption, the method intern() of String is know to be difficult to handle. 
> 1) it is a native call and therefore slower than anything on the Java level
> 2) the String pool is part of the perm space and not of the general heap, so it's size is more restricted and needs extra VM params to be managed
> 3) Some VMs show GC problems with strings in the string pool
> Suggested solution is a WeakHashMap instead, that takes care of unifying the String instances and at the same time keeping the pool in the heap space and releasing the String when it is not longer needed. For extra performance in a concurrent environment, a ConcurrentHashMap-like implementation of a weak hashmap is recommended, because we mostly read from the pool.
> We saw a 10% improvement in throughout and response time of our application and the application is not only doing searches (we read a lot of documents from the result). So a single measurement test case could show even more improvement in single and concurrent usage.
> The Cache:
> /** Cache to replace the expensive String.intern() call with the java version */
> private final static Map<String, WeakReference<String>> unifiedStringsCache =
>    Collections.synchronizedMap(new WeakHashMap<String, WeakReference<String>>(109));
> The access to it, instead of this.name = name.intern;
> // unify the strings, but do not use the expensive String.intern() version
> // which is not "weak enough", uses the perm space and is a native call
> String unifiedName = null;
> WeakReference<String> ref = unifiedStringsCache.get(name);
> if (ref != null)
> {
>     unifiedName = ref.get();
> }
> if (unifiedName == null)
> {
>     unifiedStringsCache.put(name, new WeakReference(name));
>     unifiedName = name;
> }
> this.name = unifiedName;
> I guess it is sufficient to have mostly all fields names interned, so I skipped the additional synchronization around the access and take the risk that only 99.99% :) of all field names are interned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1308) Remove String.intern() from Field.java to increase performance and lower contention

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605832#action_12605832 ] 

Otis Gospodnetic commented on LUCENE-1308:
------------------------------------------

Rene, can you provide a patch along with unit tests?  Have you or can you run contrib/benchmarks and include your before the changes and after the changes results here, so we can see what difference this change makes?  Thanks.


> Remove String.intern() from Field.java to increase performance and lower contention
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1308
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1308
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.3.2
>            Reporter: Rene S
>
> Right now, *document.Field is interning all field names. While this makes sense because it lowers the overall memory consumption, the method intern() of String is know to be difficult to handle. 
> 1) it is a native call and therefore slower than anything on the Java level
> 2) the String pool is part of the perm space and not of the general heap, so it's size is more restricted and needs extra VM params to be managed
> 3) Some VMs show GC problems with strings in the string pool
> Suggested solution is a WeakHashMap instead, that takes care of unifying the String instances and at the same time keeping the pool in the heap space and releasing the String when it is not longer needed. For extra performance in a concurrent environment, a ConcurrentHashMap-like implementation of a weak hashmap is recommended, because we mostly read from the pool.
> We saw a 10% improvement in throughout and response time of our application and the application is not only doing searches (we read a lot of documents from the result). So a single measurement test case could show even more improvement in single and concurrent usage.
> The Cache:
> /** Cache to replace the expensive String.intern() call with the java version */
> private final static Map<String, WeakReference<String>> unifiedStringsCache =
>    Collections.synchronizedMap(new WeakHashMap<String, WeakReference<String>>(109));
> The access to it, instead of this.name = name.intern;
> // unify the strings, but do not use the expensive String.intern() version
> // which is not "weak enough", uses the perm space and is a native call
> String unifiedName = null;
> WeakReference<String> ref = unifiedStringsCache.get(name);
> if (ref != null)
> {
>     unifiedName = ref.get();
> }
> if (unifiedName == null)
> {
>     unifiedStringsCache.put(name, new WeakReference(name));
>     unifiedName = name;
> }
> this.name = unifiedName;
> I guess it is sufficient to have mostly all fields names interned, so I skipped the additional synchronization around the access and take the risk that only 99.99% :) of all field names are interned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1308) Remove String.intern() from Field.java to increase performance and lower contention

Posted by "Rene S (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rene S updated LUCENE-1308:
---------------------------

    Description: 
Right now, *document.Field is interning all field names. While this makes sense because it lowers the overall memory consumption, the method intern() of String is know to be difficult to handle. 

1) it is a native call and therefore slower than anything on the Java level
2) the String pool is part of the perm space and not of the general heap, so it's size is more restricted and needs extra VM params to be managed
3) Some VMs show GC problems with strings in the string pool

Suggested solution is a WeakHashMap instead, that takes care of unifying the String instances and at the same time keeping the pool in the heap space and releasing the String when it is not longer needed. For extra performance in a concurrent environment, a ConcurrentHashMap-like implementation of a weak hashmap is recommended, because we mostly read from the pool.

We saw a 10% improvement in throughout and response time of our application and the application is not only doing searches (we read a lot of documents from the result). So a single measurement test case could show even more improvement in single and concurrent usage.

The Cache:
/** Cache to replace the expensive String.intern() call with the java version */
private final static Map<String, WeakReference<String>> unifiedStringsCache =
   Collections.synchronizedMap(new WeakHashMap<String, WeakReference<String>>(109));

The access to it, instead of this.name = name.intern;

// unify the strings, but do not use the expensive String.intern() version
// which is not "weak enough", uses the perm space and is a native call
String unifiedName = null;

WeakReference<String> ref = unifiedStringsCache.get(name);
if (ref != null)
{
    unifiedName = ref.get();
}
if (unifiedName == null)
{
    unifiedStringsCache.put(name, new WeakReference(name));
    unifiedName = name;
}
this.name = unifiedName;

I guess it is sufficient to have mostly all fields names interned, so I skipped the additional synchronization around the access and take the risk that only 99.99% :) of all field names are interned.


  was:
Right now, *document.Field is interning all field names. While this makes sense because it lowers the overall memory consumption, the method intern() of String is know to be difficult to handle. 

1) it is a native call and therefore slower than anything on the Java level
2) the String pool is part of the perm space and not of the general heap, so it's size is more restricted and needs extra VM params to be managed
3) Some VMs show GC problems with strings in the string pool

Suggested solution is a WeakHashMap instead, that takes care of unifying the String instances and at the same time keeping the pool in the heap space and releasing the String when it is not longer needed. For extra performance in a concurrent environment, a ConcurrentHashMap-like implementation of a weak hashmap is recommended, because we mostly read from the pool.

We saw a 10% improvement in throughout and response time of our application and the application is not only doing searches (we read a lot of documents from the result). So a single measurement test case could show even more improvement in single and concurrent usage.

****** The Cache:
  /** Cache to replace the expensive String.intern() call with the java version */
  private final static Map<String, WeakReference<String>> unifiedStringsCache =
  	Collections.synchronizedMap(new WeakHashMap<String, WeakReference<String>>(109));

****** The access to it, instead of this.name = name.intern;

// unify the strings, but do not use the expensive String.intern() version
// which is not "weak enough", uses the perm space and is a native call
String unifiedName = null;

WeakReference<String> ref = unifiedStringsCache.get(name);
if (ref != null)
{
    unifiedName = ref.get();
}
if (unifiedName == null)
{
    unifiedStringsCache.put(name, new WeakReference(name));
    unifiedName = name;
}
this.name = unifiedName;

I guess it is sufficient to have mostly all fields names interned, so I skipped the additional synchronization around the access and take the risk that only 99.99% :) of all field names are interned.



> Remove String.intern() from Field.java to increase performance and lower contention
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1308
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1308
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.3.2
>            Reporter: Rene S
>
> Right now, *document.Field is interning all field names. While this makes sense because it lowers the overall memory consumption, the method intern() of String is know to be difficult to handle. 
> 1) it is a native call and therefore slower than anything on the Java level
> 2) the String pool is part of the perm space and not of the general heap, so it's size is more restricted and needs extra VM params to be managed
> 3) Some VMs show GC problems with strings in the string pool
> Suggested solution is a WeakHashMap instead, that takes care of unifying the String instances and at the same time keeping the pool in the heap space and releasing the String when it is not longer needed. For extra performance in a concurrent environment, a ConcurrentHashMap-like implementation of a weak hashmap is recommended, because we mostly read from the pool.
> We saw a 10% improvement in throughout and response time of our application and the application is not only doing searches (we read a lot of documents from the result). So a single measurement test case could show even more improvement in single and concurrent usage.
> The Cache:
> /** Cache to replace the expensive String.intern() call with the java version */
> private final static Map<String, WeakReference<String>> unifiedStringsCache =
>    Collections.synchronizedMap(new WeakHashMap<String, WeakReference<String>>(109));
> The access to it, instead of this.name = name.intern;
> // unify the strings, but do not use the expensive String.intern() version
> // which is not "weak enough", uses the perm space and is a native call
> String unifiedName = null;
> WeakReference<String> ref = unifiedStringsCache.get(name);
> if (ref != null)
> {
>     unifiedName = ref.get();
> }
> if (unifiedName == null)
> {
>     unifiedStringsCache.put(name, new WeakReference(name));
>     unifiedName = name;
> }
> this.name = unifiedName;
> I guess it is sufficient to have mostly all fields names interned, so I skipped the additional synchronization around the access and take the risk that only 99.99% :) of all field names are interned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Closed: (LUCENE-1308) Remove String.intern() from Field.java to increase performance and lower contention

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller closed LUCENE-1308.
-------------------------------

    Resolution: Duplicate

> Remove String.intern() from Field.java to increase performance and lower contention
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1308
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1308
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.3.2
>            Reporter: Rene Schwietzke
>         Attachments: yad.zip
>
>
> Right now, *document.Field is interning all field names. While this makes sense because it lowers the overall memory consumption, the method intern() of String is know to be difficult to handle. 
> 1) it is a native call and therefore slower than anything on the Java level
> 2) the String pool is part of the perm space and not of the general heap, so it's size is more restricted and needs extra VM params to be managed
> 3) Some VMs show GC problems with strings in the string pool
> Suggested solution is a WeakHashMap instead, that takes care of unifying the String instances and at the same time keeping the pool in the heap space and releasing the String when it is not longer needed. For extra performance in a concurrent environment, a ConcurrentHashMap-like implementation of a weak hashmap is recommended, because we mostly read from the pool.
> We saw a 10% improvement in throughout and response time of our application and the application is not only doing searches (we read a lot of documents from the result). So a single measurement test case could show even more improvement in single and concurrent usage.
> The Cache:
> /** Cache to replace the expensive String.intern() call with the java version */
> private final static Map<String, WeakReference<String>> unifiedStringsCache =
>    Collections.synchronizedMap(new WeakHashMap<String, WeakReference<String>>(109));
> The access to it, instead of this.name = name.intern;
> // unify the strings, but do not use the expensive String.intern() version
> // which is not "weak enough", uses the perm space and is a native call
> String unifiedName = null;
> WeakReference<String> ref = unifiedStringsCache.get(name);
> if (ref != null)
> {
>     unifiedName = ref.get();
> }
> if (unifiedName == null)
> {
>     unifiedStringsCache.put(name, new WeakReference(name));
>     unifiedName = name;
> }
> this.name = unifiedName;
> I guess it is sufficient to have mostly all fields names interned, so I skipped the additional synchronization around the access and take the risk that only 99.99% :) of all field names are interned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org