You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Simon Willnauer (JIRA)" <ji...@apache.org> on 2013/05/16 13:37:22 UTC

[jira] [Commented] (LUCENE-4989) Hanging on DocumentsWriterStallControl.waitIfStalled forever

    [ https://issues.apache.org/jira/browse/LUCENE-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659448#comment-13659448 ] 

Simon Willnauer commented on LUCENE-4989:
-----------------------------------------

this might be related to LUCENE-5002 I think. this can happen in multiple scenarios. Can you tell if there are any other blocked threads in flush by any chance?


                
> Hanging on DocumentsWriterStallControl.waitIfStalled forever
> ------------------------------------------------------------
>
>                 Key: LUCENE-4989
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4989
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.1
>         Environment: Linux 2.6.32
>            Reporter: Jessica Cheng
>              Labels: hang
>             Fix For: 5.0, 4.3.1
>
>
> In an environment where our underlying storage was timing out on various operations, we find all of our indexing threads eventually stuck in the following state (so far for 4 days):
> "Thread-0" daemon prio=5 Thread id=556  WAITING
> 	at java.lang.Object.wait(Native Method)
> 	at java.lang.Object.wait(Object.java:503)
> 	at org.apache.lucene.index.DocumentsWriterStallControl.waitIfStalled(DocumentsWriterStallControl.java:74)
> 	at org.apache.lucene.index.DocumentsWriterFlushControl.waitIfStalled(DocumentsWriterFlushControl.java:676)
> 	at org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:301)
> 	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:361)
> 	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1484)
> 	at ...
> I have not yet enabled detail logging and tried to reproduce yet, but looking at the code, I see that DWFC.abortPendingFlushes does
>         try {
>           dwpt.abort();
>           doAfterFlush(dwpt);
>         } catch (Throwable ex) {
>           // ignore - keep on aborting the flush queue
>         }
> (and the same for the blocked ones). Since the throwable is ignored, I can't say for sure, but I've seen DWPT.abort thrown in other cases, so if it does throw, we'd fail to call doAfterFlush and properly decrement flushBytes. This can be a problem, right? Is it possible to do this instead:
>         try {
>           dwpt.abort();
>         } catch (Throwable ex) {
>           // ignore - keep on aborting the flush queue
>         } finally {
>           try {
>             doAfterFlush(dwpt);
>           } catch (Throwable ex2) {
>             // ignore - keep on aborting the flush queue
>           }
>         }
> It's ugly but safer. Otherwise, maybe at least add logging for the throwable just to make sure this is/isn't happening.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org