You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2013/04/12 16:26:15 UTC

[jira] [Commented] (LUCENE-4930) Lucene's use of WeakHashMap at index time prevents full use of cores on some multi-core machines, due to contention

    [ https://issues.apache.org/jira/browse/LUCENE-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630093#comment-13630093 ] 

Uwe Schindler commented on LUCENE-4930:
---------------------------------------

The problem here is a class-static map (it makes no difference between a WeakHashMap or WeakIdentityMap, which is Lucene's own impl). The problem here is that the contents of the map are rarely changing, so this would be a typical use-case for a offloaded reap() thread (like Google Commons Collections offers: The cleanup of the hashmap is moved to a separate thread).

To me it is still strange that you really see this problem: The poll() method on ReferenceQueue is using double-checked locking (it checks only a volatile field and should only reach the lock if there is anything to clean up in the queue). As the contents of the weak hash map only changes when new attribute classes are added this weak map should only be cleaned up when you reload cores and new class loader come into account

During normal indexing this should have no contention at all. What Java version are you using, and is there really a visible slowdown caused by this - you gave no numbers! The contention here may be displayed in stack traces requested from threads, but its unlikely to have an effect on indexing (because the map is mostly static).

FYI: This code is unchanged since Lucene 2.9!
                
> Lucene's use of WeakHashMap at index time prevents full use of cores on some multi-core machines, due to contention
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-4930
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4930
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.2
>         Environment: Dell blade system with 16 cores
>            Reporter: Karl Wright
>
> Our project is not optimally using full processing power during under indexing load on Lucene 4.2.0.  The reason is the AttributeSource.addAttribute() method, which goes through a WeakHashMap synchronizer, which is apparently single-threaded for a significant amount of time.  Have a look at the following trace:
> "pool-1-thread-28" prio=10 tid=0x00007f47fc104800 nid=0x672b waiting for monitor entry [0x00007f47d19ed000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.lang.ref.ReferenceQueue.poll(ReferenceQueue.java:98)
>         - waiting to lock <0x00000005c5cd9988> (a java.lang.ref.ReferenceQueue$Lock)
>         at org.apache.lucene.util.WeakIdentityMap.reap(WeakIdentityMap.java:189)
>         at org.apache.lucene.util.WeakIdentityMap.get(WeakIdentityMap.java:82)
>         at org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.getClassForInterface(AttributeSource.java:74)
>         at org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.createAttributeInstance(AttributeSource.java:65)
>         at org.apache.lucene.util.AttributeSource.addAttribute(AttributeSource.java:271)
>         at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:107)
>         at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:254)
>         at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:256)
>         at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:376)
>         at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
>         at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1148)
>         at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1129)
> …
> We’ve had to make significant changes to the way we were indexing in order to not hit this issue as much, such as indexing using TokenStreams which we reuse, when it would have been more convenient to index with just tokens.  (The reason is that Lucene internally creates TokenStream objects when you pass a token array to IndexableField, and doesn’t reuse them, and the addAttribute() causes massive contention as a result.)  However, as you can see from the trace above, we’re still running into contention due to other addAttribute() method calls that are buried deep inside Lucene.
> I can see two ways forward.  Either not use WeakHashMap or use it in a more efficient way, or make darned sure no addAttribute() calls are done in the main code indexing execution path.  (I think it would be easy to fix DocInverterPerField in that way, FWIW.  I just don’t know what we’ll run into next.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org