You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2009/06/25 11:51:09 UTC

[jira] Created: (LUCENE-1717) IndexWriter does not properly account for the RAM consumed by pending deletes

IndexWriter does not properly account for the RAM consumed by pending deletes
-----------------------------------------------------------------------------

                 Key: LUCENE-1717
                 URL: https://issues.apache.org/jira/browse/LUCENE-1717
             Project: Lucene - Java
          Issue Type: Bug
    Affects Versions: 2.4.1, 2.4
            Reporter: Michael McCandless
             Fix For: 2.9


IndexWriter, with autoCommit false, is able to carry buffered deletes for quite some time before materializing them to docIDs (thus freeing up RAM used).

It's only on triggering a merge (or, commit/close) that the deletes are materialized and the RAM is freed.

I expect this in practice is a smallish amount of RAM, but we should still fix it.

I don't have a patch yet so if someone wants to grab this, feel free!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1717) IndexWriter does not properly account for the RAM consumed by pending deletes

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724010#action_12724010 ] 

Michael McCandless commented on LUCENE-1717:
--------------------------------------------

Sorry, you are correct -- that's the obvious workaround here.

But we default that to unlimited (meaning, flush when RAM limit is hit), which I think is a good default once we fix the accounting in IndexWriter to properly account for buffered delete's RAM usage.

> IndexWriter does not properly account for the RAM consumed by pending deletes
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-1717
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1717
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 2.4, 2.4.1
>            Reporter: Michael McCandless
>             Fix For: 2.9
>
>
> IndexWriter, with autoCommit false, is able to carry buffered deletes for quite some time before materializing them to docIDs (thus freeing up RAM used).
> It's only on triggering a merge (or, commit/close) that the deletes are materialized and the RAM is freed.
> I expect this in practice is a smallish amount of RAM, but we should still fix it.
> I don't have a patch yet so if someone wants to grab this, feel free!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1717) IndexWriter does not properly account for the RAM consumed by pending deletes

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724074#action_12724074 ] 

Simon Willnauer commented on LUCENE-1717:
-----------------------------------------

Regarding buffered delete's RAM usage, accounting an exact number is quite difficult in this case as there are many strings involved (Terms with field and value) . BufferedDeletes#terms stores <Term, Num> and BufferedDeletes#queries stores <Query, Num> in both cases the value part is easy to account while especially for query the memory consumption is hard to guess similarly the amount of memory a Term takes.

On the other hand I would like to have a notion of memory consumption os BufferedDeletes but the IndexWriters#setRAMBufferSizeMB javaDoc clearly says that this does not include the memory used by buffered deletes. I would rather tend to leave it as it is and make it clear in javadoc / wiki that setMaxBufferedDeleteTerms is the way to go if you run into memory problems. Feels quite ambiguous to estimate the memory of buffered deletes.

bq. I think is a good default once we fix the accounting in IndexWriter to properly account for buffered delete's RAM usage.
is there already an issue to fix the RAM usage? 

> IndexWriter does not properly account for the RAM consumed by pending deletes
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-1717
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1717
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 2.4, 2.4.1
>            Reporter: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: BufferedDeletes_beautification.patch
>
>
> IndexWriter, with autoCommit false, is able to carry buffered deletes for quite some time before materializing them to docIDs (thus freeing up RAM used).
> It's only on triggering a merge (or, commit/close) that the deletes are materialized and the RAM is freed.
> I expect this in practice is a smallish amount of RAM, but we should still fix it.
> I don't have a patch yet so if someone wants to grab this, feel free!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1717) IndexWriter does not properly account for the RAM consumed by pending deletes

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724039#action_12724039 ] 

Michael McCandless commented on LUCENE-1717:
--------------------------------------------

deletesInRAM is in fact cleared, on calling deletesFlushed.update(deletesInRAM).  Ie, that call "transfers" the deletesInRAM to deletesFlushed.

Then, when applyDeletes is called, we clear deletesFlushed.

> IndexWriter does not properly account for the RAM consumed by pending deletes
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-1717
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1717
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 2.4, 2.4.1
>            Reporter: Michael McCandless
>             Fix For: 2.9
>
>
> IndexWriter, with autoCommit false, is able to carry buffered deletes for quite some time before materializing them to docIDs (thus freeing up RAM used).
> It's only on triggering a merge (or, commit/close) that the deletes are materialized and the RAM is freed.
> I expect this in practice is a smallish amount of RAM, but we should still fix it.
> I don't have a patch yet so if someone wants to grab this, feel free!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1717) IndexWriter does not properly account for the RAM consumed by pending deletes

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1717.
----------------------------------------

    Resolution: Fixed

> IndexWriter does not properly account for the RAM consumed by pending deletes
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-1717
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1717
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 2.4, 2.4.1
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: BufferedDeletes_beautification.patch, LUCENE-1717.patch
>
>
> IndexWriter, with autoCommit false, is able to carry buffered deletes for quite some time before materializing them to docIDs (thus freeing up RAM used).
> It's only on triggering a merge (or, commit/close) that the deletes are materialized and the RAM is freed.
> I expect this in practice is a smallish amount of RAM, but we should still fix it.
> I don't have a patch yet so if someone wants to grab this, feel free!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1717) IndexWriter does not properly account for the RAM consumed by pending deletes

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1717:
---------------------------------------

    Attachment: LUCENE-1717.patch

Attached patch.

I made an estimate of RAM usage for buffered delete terms and docIDs that I think should be fairly close.  Buffered delete Query instances, however, are undercounted (I say this in the javadocs) since measuring that would be rather challenging.

> IndexWriter does not properly account for the RAM consumed by pending deletes
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-1717
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1717
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 2.4, 2.4.1
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: BufferedDeletes_beautification.patch, LUCENE-1717.patch
>
>
> IndexWriter, with autoCommit false, is able to carry buffered deletes for quite some time before materializing them to docIDs (thus freeing up RAM used).
> It's only on triggering a merge (or, commit/close) that the deletes are materialized and the RAM is freed.
> I expect this in practice is a smallish amount of RAM, but we should still fix it.
> I don't have a patch yet so if someone wants to grab this, feel free!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1717) IndexWriter does not properly account for the RAM consumed by pending deletes

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724085#action_12724085 ] 

Michael McCandless commented on LUCENE-1717:
--------------------------------------------

bq. is there already an issue to fix the RAM usage?

This is the issue.

I think even if our measure is not perfect we should try to take a stab at accounting for RAM usage of deletes; it's a trap, now.  We shouldn't set traps.

> IndexWriter does not properly account for the RAM consumed by pending deletes
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-1717
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1717
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 2.4, 2.4.1
>            Reporter: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: BufferedDeletes_beautification.patch
>
>
> IndexWriter, with autoCommit false, is able to carry buffered deletes for quite some time before materializing them to docIDs (thus freeing up RAM used).
> It's only on triggering a merge (or, commit/close) that the deletes are materialized and the RAM is freed.
> I expect this in practice is a smallish amount of RAM, but we should still fix it.
> I don't have a patch yet so if someone wants to grab this, feel free!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1717) IndexWriter does not properly account for the RAM consumed by pending deletes

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724041#action_12724041 ] 

Simon Willnauer commented on LUCENE-1717:
-----------------------------------------

HA, true! I missed that. Nevermind!
I did not catch it as it calls the clear methods directly instead of using BufferedDocs#clear(). Might be easier to catch if people look into that if we would call clear() instead. I will attach a patch for this beautification.

> IndexWriter does not properly account for the RAM consumed by pending deletes
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-1717
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1717
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 2.4, 2.4.1
>            Reporter: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: BufferedDeletes_beautification.patch
>
>
> IndexWriter, with autoCommit false, is able to carry buffered deletes for quite some time before materializing them to docIDs (thus freeing up RAM used).
> It's only on triggering a merge (or, commit/close) that the deletes are materialized and the RAM is freed.
> I expect this in practice is a smallish amount of RAM, but we should still fix it.
> I don't have a patch yet so if someone wants to grab this, feel free!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1717) IndexWriter does not properly account for the RAM consumed by pending deletes

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned LUCENE-1717:
------------------------------------------

    Assignee: Michael McCandless

> IndexWriter does not properly account for the RAM consumed by pending deletes
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-1717
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1717
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 2.4, 2.4.1
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: BufferedDeletes_beautification.patch
>
>
> IndexWriter, with autoCommit false, is able to carry buffered deletes for quite some time before materializing them to docIDs (thus freeing up RAM used).
> It's only on triggering a merge (or, commit/close) that the deletes are materialized and the RAM is freed.
> I expect this in practice is a smallish amount of RAM, but we should still fix it.
> I don't have a patch yet so if someone wants to grab this, feel free!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1717) IndexWriter does not properly account for the RAM consumed by pending deletes

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-1717:
------------------------------------

    Attachment: BufferedDeletes_beautification.patch

> IndexWriter does not properly account for the RAM consumed by pending deletes
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-1717
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1717
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 2.4, 2.4.1
>            Reporter: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: BufferedDeletes_beautification.patch
>
>
> IndexWriter, with autoCommit false, is able to carry buffered deletes for quite some time before materializing them to docIDs (thus freeing up RAM used).
> It's only on triggering a merge (or, commit/close) that the deletes are materialized and the RAM is freed.
> I expect this in practice is a smallish amount of RAM, but we should still fix it.
> I don't have a patch yet so if someone wants to grab this, feel free!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1717) IndexWriter does not properly account for the RAM consumed by pending deletes

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724032#action_12724032 ] 

Simon Willnauer commented on LUCENE-1717:
-----------------------------------------

bq. But we default that to unlimited (meaning, flush when RAM limit is hit), which I think is a good default once we fix the accounting in IndexWriter to properly account for buffered delete's RAM usage.
I agree we should track the ram usage of BufferedDeletes too.  
One other thing I wonder about is why deletesInRam is only cleared on abort. Once IndexWriter#doFlushInternal is executed the DocWriter pushes deletes from deletesInRam to deletesFlushed. Shouldn't this call deletesInRam#clear() to free the memory in this instance of BufferedDeletes. That could at least help a bit if I do not miss anything important.



> IndexWriter does not properly account for the RAM consumed by pending deletes
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-1717
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1717
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 2.4, 2.4.1
>            Reporter: Michael McCandless
>             Fix For: 2.9
>
>
> IndexWriter, with autoCommit false, is able to carry buffered deletes for quite some time before materializing them to docIDs (thus freeing up RAM used).
> It's only on triggering a merge (or, commit/close) that the deletes are materialized and the RAM is freed.
> I expect this in practice is a smallish amount of RAM, but we should still fix it.
> I don't have a patch yet so if someone wants to grab this, feel free!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1717) IndexWriter does not properly account for the RAM consumed by pending deletes

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724007#action_12724007 ] 

Simon Willnauer commented on LUCENE-1717:
-----------------------------------------

Maybe I miss something but IndexWriter#setMaxBufferedDeleteTerms can be used to set an upper bound for those terms. Once you hit the upper bound BufferedDeletes should be flushed to disc by calling IndexWriter#flush(). This can happend with either a add or a delete. Maybe I do not completely understand what you mean by materialized.

> IndexWriter does not properly account for the RAM consumed by pending deletes
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-1717
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1717
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 2.4, 2.4.1
>            Reporter: Michael McCandless
>             Fix For: 2.9
>
>
> IndexWriter, with autoCommit false, is able to carry buffered deletes for quite some time before materializing them to docIDs (thus freeing up RAM used).
> It's only on triggering a merge (or, commit/close) that the deletes are materialized and the RAM is freed.
> I expect this in practice is a smallish amount of RAM, but we should still fix it.
> I don't have a patch yet so if someone wants to grab this, feel free!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org