You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Simon Willnauer (JIRA)" <ji...@apache.org> on 2011/05/12 18:55:47 UTC

[jira] [Created] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

DWFlushControl does not take active DWPT out of the loop on fullFlush
---------------------------------------------------------------------

                 Key: LUCENE-3090
                 URL: https://issues.apache.org/jira/browse/LUCENE-3090
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 4.0
            Reporter: Simon Willnauer
            Assignee: Simon Willnauer
            Priority: Critical
             Fix For: 4.0


We have seen several OOM on TestNRTThreads and all of them are caused by DWFlushControl missing DWPT that are set as flushPending but can't full due to a full flush going on. Yet that means that those DWPT are filling up in the background while they should actually be checked out and blocked until the full flush finishes. Even further we currently stall on the maxNumThreadStates while we should stall on the num of active thread states. I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034053#comment-13034053 ] 

Simon Willnauer commented on LUCENE-3090:
-----------------------------------------

I did 150 runs for all Lucene Tests incl. contrib - no failure so far. Seems to be good to go.

> DWFlushControl does not take active DWPT out of the loop on fullFlush
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-3090
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3090
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Critical
>             Fix For: 4.0
>
>         Attachments: LUCENE-3090.patch, LUCENE-3090.patch, LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by DWFlushControl missing DWPT that are set as flushPending but can't full due to a full flush going on. Yet that means that those DWPT are filling up in the background while they should actually be checked out and blocked until the full flush finishes. Even further we currently stall on the maxNumThreadStates while we should stall on the num of active thread states. I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034103#comment-13034103 ] 

Simon Willnauer commented on LUCENE-3090:
-----------------------------------------

Thanks mike for review and testing!! It makes me feel better with those asserts in there now... I will commit tomorrow.

> DWFlushControl does not take active DWPT out of the loop on fullFlush
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-3090
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3090
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Critical
>             Fix For: 4.0
>
>         Attachments: LUCENE-3090.patch, LUCENE-3090.patch, LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by DWFlushControl missing DWPT that are set as flushPending but can't full due to a full flush going on. Yet that means that those DWPT are filling up in the background while they should actually be checked out and blocked until the full flush finishes. Even further we currently stall on the maxNumThreadStates while we should stall on the num of active thread states. I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033080#comment-13033080 ] 

Simon Willnauer commented on LUCENE-3090:
-----------------------------------------

bq. But shouldn't stallControl kick in in that case? Ie, we stall all indexing if the number of flush-pending DWPTs is >= the number of active DWPTs, I think?
Right so lets say we have two active thread states:
1. thread 1 starts indexing (max ram is 16M) it indexes n docs and has 15.9 MB ram used. Now n+1 doc comes in has 5MB (active mem= 20.9M flush Mem: 0M)

2. take it out for flush (active mem=0M flush Mem: 20.9M)

3. thread 2 starts indexing and fills ram quickly ending up with 18M memory (active mem=18M flush Mem: 20.9M)
4. take thread 2 out for flush (active mem=0M flush Mem: 38.9M)
5. thread 3 has already started indexing and reaches the RAM threshold (16M) so we have: (active mem=16M flush Mem: 38.9M)
6. take it out for flushing (now we stall currently) (active mem=0M flush Mem: 54.9M) - this is more than 3x max ram buffer.

we currently stall at  flush-pending DWPTs is > (num active DWPT + 1) we can reduce that though but maybe we should swap back to ram based stalling?




> DWFlushControl does not take active DWPT out of the loop on fullFlush
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-3090
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3090
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Critical
>             Fix For: 4.0
>
>         Attachments: LUCENE-3090.patch, LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by DWFlushControl missing DWPT that are set as flushPending but can't full due to a full flush going on. Yet that means that those DWPT are filling up in the background while they should actually be checked out and blocked until the full flush finishes. Even further we currently stall on the maxNumThreadStates while we should stall on the num of active thread states. I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3090:
------------------------------------

    Attachment: LUCENE-3090.patch

new patch - removed leftover assertion from debugging

> DWFlushControl does not take active DWPT out of the loop on fullFlush
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-3090
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3090
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Critical
>             Fix For: 4.0
>
>         Attachments: LUCENE-3090.patch, LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by DWFlushControl missing DWPT that are set as flushPending but can't full due to a full flush going on. Yet that means that those DWPT are filling up in the background while they should actually be checked out and blocked until the full flush finishes. Even further we currently stall on the maxNumThreadStates while we should stall on the num of active thread states. I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033043#comment-13033043 ] 

Robert Muir commented on LUCENE-3090:
-------------------------------------

bq. net flush pending means? we only differ between flushing ram and active ram so flushing ram can easily get above such a limit if IO is slow...

I/O or just "O"? Should we add a ThrottledIndexInput too? :)

> DWFlushControl does not take active DWPT out of the loop on fullFlush
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-3090
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3090
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Critical
>             Fix For: 4.0
>
>         Attachments: LUCENE-3090.patch, LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by DWFlushControl missing DWPT that are set as flushPending but can't full due to a full flush going on. Yet that means that those DWPT are filling up in the background while they should actually be checked out and blocked until the full flush finishes. Even further we currently stall on the maxNumThreadStates while we should stall on the num of active thread states. I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033024#comment-13033024 ] 

Simon Willnauer commented on LUCENE-3090:
-----------------------------------------

bq. Could we add an assert that net flushPending + active RAM never exceeds some multiplier (2X?) of the configured max RAM?
net flush pending means? we only differ between flushing ram and active ram so flushing ram can easily get above such a limit if IO is slow...


> DWFlushControl does not take active DWPT out of the loop on fullFlush
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-3090
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3090
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Critical
>             Fix For: 4.0
>
>         Attachments: LUCENE-3090.patch, LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by DWFlushControl missing DWPT that are set as flushPending but can't full due to a full flush going on. Yet that means that those DWPT are filling up in the background while they should actually be checked out and blocked until the full flush finishes. Even further we currently stall on the maxNumThreadStates while we should stall on the num of active thread states. I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3090:
------------------------------------

    Attachment: LUCENE-3090.patch

here is a patch. I fixed DWFlushControl to block flushes while a fullflush is happening and make them available once the fullflush is done. I also ensure that incoming threads help out flushing if there are DWPT about to flush but not taken yet before indexing their document. All tests pass (I run them lots of times :)

> DWFlushControl does not take active DWPT out of the loop on fullFlush
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-3090
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3090
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Critical
>             Fix For: 4.0
>
>         Attachments: LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by DWFlushControl missing DWPT that are set as flushPending but can't full due to a full flush going on. Yet that means that those DWPT are filling up in the background while they should actually be checked out and blocked until the full flush finishes. Even further we currently stall on the maxNumThreadStates while we should stall on the num of active thread states. I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer resolved LUCENE-3090.
-------------------------------------

       Resolution: Fixed
    Lucene Fields: [New, Patch Available]  (was: [New])

Committed in revision 1104026.

> DWFlushControl does not take active DWPT out of the loop on fullFlush
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-3090
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3090
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Critical
>             Fix For: 4.0
>
>         Attachments: LUCENE-3090.patch, LUCENE-3090.patch, LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by DWFlushControl missing DWPT that are set as flushPending but can't full due to a full flush going on. Yet that means that those DWPT are filling up in the background while they should actually be checked out and blocked until the full flush finishes. Even further we currently stall on the maxNumThreadStates while we should stall on the num of active thread states. I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033073#comment-13033073 ] 

Michael McCandless commented on LUCENE-3090:
--------------------------------------------

{quote}
bq. Could we add an assert that net flushPending + active RAM never exceeds some multiplier (2X?) of the configured max RAM?

net flush pending means? we only differ between flushing ram and active ram so flushing ram can easily get above such a limit if IO is slow...
{quote}

But shouldn't stallControl kick in in that case?  Ie, we stall all indexing if the number of flush-pending DWPTs is >= the number of active DWPTs, I think?

> DWFlushControl does not take active DWPT out of the loop on fullFlush
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-3090
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3090
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Critical
>             Fix For: 4.0
>
>         Attachments: LUCENE-3090.patch, LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by DWFlushControl missing DWPT that are set as flushPending but can't full due to a full flush going on. Yet that means that those DWPT are filling up in the background while they should actually be checked out and blocked until the full flush finishes. Even further we currently stall on the maxNumThreadStates while we should stall on the num of active thread states. I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3090:
------------------------------------

    Attachment: LUCENE-3090.patch

next iteration. This patch changes the stalling mechanism from using num DWPT flushing to netBytes and stalls at 2 x maxRamBuffer if we flush on num bytes. Stalling on memory consumption allows to add an assert / upper bound to the netMemory which is nice but it doesn't help if we are not flushing on RAM usage.

I think what we need to do (sep. issue) is to allow people to add a maxTotalUsedRam which defaults to 2x maxRamBuffer if set or Runtime#maxMemory() / 2 if flushing by docCount to allow us to stall indexing threads iff we cross that border and there is at least one DWPT flushing or pending. 

> DWFlushControl does not take active DWPT out of the loop on fullFlush
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-3090
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3090
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Critical
>             Fix For: 4.0
>
>         Attachments: LUCENE-3090.patch, LUCENE-3090.patch, LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by DWFlushControl missing DWPT that are set as flushPending but can't full due to a full flush going on. Yet that means that those DWPT are filling up in the background while they should actually be checked out and blocked until the full flush finishes. Even further we currently stall on the maxNumThreadStates while we should stall on the num of active thread states. I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034095#comment-13034095 ] 

Michael McCandless commented on LUCENE-3090:
--------------------------------------------

Patch looks good but hairy Simon!

I ran 144 iters of all (Solr+lucene+lucene-contrib) tests.  I hit three fails (one in Solr's TestJoin.testRandomJoin, and two in Solr's HighlighterTest) but I don't think these are related to this patch.

> DWFlushControl does not take active DWPT out of the loop on fullFlush
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-3090
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3090
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Critical
>             Fix For: 4.0
>
>         Attachments: LUCENE-3090.patch, LUCENE-3090.patch, LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by DWFlushControl missing DWPT that are set as flushPending but can't full due to a full flush going on. Yet that means that those DWPT are filling up in the background while they should actually be checked out and blocked until the full flush finishes. Even further we currently stall on the maxNumThreadStates while we should stall on the num of active thread states. I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032986#comment-13032986 ] 

Michael McCandless commented on LUCENE-3090:
--------------------------------------------

I like the Healthiness -> StallControl renaming :)

Could we add an assert that net flushPending + active RAM never exceeds some multiplier (2X?) of the configured max RAM?


> DWFlushControl does not take active DWPT out of the loop on fullFlush
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-3090
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3090
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Critical
>             Fix For: 4.0
>
>         Attachments: LUCENE-3090.patch, LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by DWFlushControl missing DWPT that are set as flushPending but can't full due to a full flush going on. Yet that means that those DWPT are filling up in the background while they should actually be checked out and blocked until the full flush finishes. Even further we currently stall on the maxNumThreadStates while we should stall on the num of active thread states. I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org