You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2010/05/17 18:06:45 UTC

[jira] Created: (LUCENE-2467) IndexWriter memory leak when large docs are indexed

IndexWriter memory leak when large docs are indexed
---------------------------------------------------

                 Key: LUCENE-2467
                 URL: https://issues.apache.org/jira/browse/LUCENE-2467
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 3.0.1, 3.0, 2.9.2, 2.9.1, 2.9, 2.4.1, 2.4, 2.3.2, 2.3.1, 2.3, 2.3.3, 2.4.2, 2.9.3, 3.0.2, 3.1, 4.0
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 2.9.3, 3.0.2, 3.1, 4.0


Spinoff from the java-user thread "IndexWriter and memory usage"...

IndexWriter has had a long standing memory leak, since LUCENE-843.

When the byte/char/int blocks are recycled to the common pool, the
per-thread DW classes incorrectly still hold a reference to them.

This normally is not a problem, since these buffers will be re-used
again.

But, if you index a massive document, causing IW to allocate more than
the RAM buffer allocated to it, then the leak happens.  So you could
have a 16 MB RAM buffer set, but if a huge doc required allocation of
200 MB worth of arrays, those 200 MB are never freed (well, until you
close the IW and deref it from the app).

It's even worse if you use multiple threads: if each thread has ever
had to index a massive document, then that thread incorrectly holds
onto the extra arrays.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2467) IndexWriter memory leak when large docs are indexed

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2467:
---------------------------------------

    Attachment: LUCENE-2467.patch

Don't hold onto the last doc/analyzer that a given thread state held onto; don't reuse postings instances anymore (we don't on trunk anymore either).

> IndexWriter memory leak when large docs are indexed
> ---------------------------------------------------
>
>                 Key: LUCENE-2467
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2467
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3, 2.3.1, 2.3.2, 2.3.3, 2.4, 2.4.1, 2.4.2, 2.9, 2.9.1, 2.9.2, 2.9.3, 3.0, 3.0.1, 3.0.2, 3.1, 4.0
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
>         Attachments: LUCENE-2467.patch, LUCENE-2467.patch
>
>
> Spinoff from the java-user thread "IndexWriter and memory usage"...
> IndexWriter has had a long standing memory leak, since LUCENE-843.
> When the byte/char/int blocks are recycled to the common pool, the
> per-thread DW classes incorrectly still hold a reference to them.
> This normally is not a problem, since these buffers will be re-used
> again.
> But, if you index a massive document, causing IW to allocate more than
> the RAM buffer allocated to it, then the leak happens.  So you could
> have a 16 MB RAM buffer set, but if a huge doc required allocation of
> 200 MB worth of arrays, those 200 MB are never freed (well, until you
> close the IW and deref it from the app).
> It's even worse if you use multiple threads: if each thread has ever
> had to index a massive document, then that thread incorrectly holds
> onto the extra arrays.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2467) IndexWriter memory leak when large docs are indexed

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2467:
---------------------------------------

    Attachment: LUCENE-2467.patch

Attached simple patch.

The patch also fixes a couple other places where we hold onto memory for too long.

> IndexWriter memory leak when large docs are indexed
> ---------------------------------------------------
>
>                 Key: LUCENE-2467
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2467
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3, 2.3.1, 2.3.2, 2.3.3, 2.4, 2.4.1, 2.4.2, 2.9, 2.9.1, 2.9.2, 2.9.3, 3.0, 3.0.1, 3.0.2, 3.1, 4.0
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
>         Attachments: LUCENE-2467.patch
>
>
> Spinoff from the java-user thread "IndexWriter and memory usage"...
> IndexWriter has had a long standing memory leak, since LUCENE-843.
> When the byte/char/int blocks are recycled to the common pool, the
> per-thread DW classes incorrectly still hold a reference to them.
> This normally is not a problem, since these buffers will be re-used
> again.
> But, if you index a massive document, causing IW to allocate more than
> the RAM buffer allocated to it, then the leak happens.  So you could
> have a 16 MB RAM buffer set, but if a huge doc required allocation of
> 200 MB worth of arrays, those 200 MB are never freed (well, until you
> close the IW and deref it from the app).
> It's even worse if you use multiple threads: if each thread has ever
> had to index a massive document, then that thread incorrectly holds
> onto the extra arrays.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Reopened: (LUCENE-2467) IndexWriter memory leak when large docs are indexed

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reopened LUCENE-2467:
----------------------------------------


A couple more places to fix...

> IndexWriter memory leak when large docs are indexed
> ---------------------------------------------------
>
>                 Key: LUCENE-2467
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2467
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3, 2.3.1, 2.3.2, 2.3.3, 2.4, 2.4.1, 2.4.2, 2.9, 2.9.1, 2.9.2, 2.9.3, 3.0, 3.0.1, 3.0.2, 3.1, 4.0
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
>         Attachments: LUCENE-2467.patch
>
>
> Spinoff from the java-user thread "IndexWriter and memory usage"...
> IndexWriter has had a long standing memory leak, since LUCENE-843.
> When the byte/char/int blocks are recycled to the common pool, the
> per-thread DW classes incorrectly still hold a reference to them.
> This normally is not a problem, since these buffers will be re-used
> again.
> But, if you index a massive document, causing IW to allocate more than
> the RAM buffer allocated to it, then the leak happens.  So you could
> have a 16 MB RAM buffer set, but if a huge doc required allocation of
> 200 MB worth of arrays, those 200 MB are never freed (well, until you
> close the IW and deref it from the app).
> It's even worse if you use multiple threads: if each thread has ever
> had to index a massive document, then that thread incorrectly holds
> onto the extra arrays.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2467) IndexWriter memory leak when large docs are indexed

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-2467.
----------------------------------------

    Resolution: Fixed

> IndexWriter memory leak when large docs are indexed
> ---------------------------------------------------
>
>                 Key: LUCENE-2467
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2467
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3, 2.3.1, 2.3.2, 2.3.3, 2.4, 2.4.1, 2.4.2, 2.9, 2.9.1, 2.9.2, 2.9.3, 3.0, 3.0.1, 3.0.2, 3.1, 4.0
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
>         Attachments: LUCENE-2467.patch, LUCENE-2467.patch
>
>
> Spinoff from the java-user thread "IndexWriter and memory usage"...
> IndexWriter has had a long standing memory leak, since LUCENE-843.
> When the byte/char/int blocks are recycled to the common pool, the
> per-thread DW classes incorrectly still hold a reference to them.
> This normally is not a problem, since these buffers will be re-used
> again.
> But, if you index a massive document, causing IW to allocate more than
> the RAM buffer allocated to it, then the leak happens.  So you could
> have a 16 MB RAM buffer set, but if a huge doc required allocation of
> 200 MB worth of arrays, those 200 MB are never freed (well, until you
> close the IW and deref it from the app).
> It's even worse if you use multiple threads: if each thread has ever
> had to index a massive document, then that thread incorrectly holds
> onto the extra arrays.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2467) IndexWriter memory leak when large docs are indexed

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-2467.
----------------------------------------

    Resolution: Fixed

> IndexWriter memory leak when large docs are indexed
> ---------------------------------------------------
>
>                 Key: LUCENE-2467
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2467
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3, 2.3.1, 2.3.2, 2.3.3, 2.4, 2.4.1, 2.4.2, 2.9, 2.9.1, 2.9.2, 2.9.3, 3.0, 3.0.1, 3.0.2, 3.1, 4.0
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
>         Attachments: LUCENE-2467.patch
>
>
> Spinoff from the java-user thread "IndexWriter and memory usage"...
> IndexWriter has had a long standing memory leak, since LUCENE-843.
> When the byte/char/int blocks are recycled to the common pool, the
> per-thread DW classes incorrectly still hold a reference to them.
> This normally is not a problem, since these buffers will be re-used
> again.
> But, if you index a massive document, causing IW to allocate more than
> the RAM buffer allocated to it, then the leak happens.  So you could
> have a 16 MB RAM buffer set, but if a huge doc required allocation of
> 200 MB worth of arrays, those 200 MB are never freed (well, until you
> close the IW and deref it from the app).
> It's even worse if you use multiple threads: if each thread has ever
> had to index a massive document, then that thread incorrectly holds
> onto the extra arrays.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org