You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2006/01/21 03:36:41 UTC

[jira] Created: (LUCENE-488) adding docs with large (binary) fields of 5mb causes OOM regardless of heap size

adding docs with large (binary) fields of 5mb causes OOM regardless of heap size
--------------------------------------------------------------------------------

         Key: LUCENE-488
         URL: http://issues.apache.org/jira/browse/LUCENE-488
     Project: Lucene - Java
        Type: Bug
    Versions: 1.9    
 Environment: Linux asimov 2.6.6.hoss1 #1 SMP Tue Jul 6 16:31:01 PDT 2004 i686 GNU/Linux

    Reporter: Hoss Man


as reported by George Washington in a message to java-user@lucene.apache.org with subect "Storing large text or binary source documents in the index and memory usage" arround 2006-01-21 there seems to be a problem with adding docs containing really large fields.

I'll attach a test case in a moment, note that (for me) regardless of how big i make my heap size, and regardless of what value I set  MIN_MB to, once it starts trying to make documents of containing 5mb of data, it can only add 9 before it rolls over and dies.

here's the output from the code as i will attach in a moment...

    [junit] Testsuite: org.apache.lucene.document.TestBigBinary
    [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 78.656 sec

    [junit] ------------- Standard Output ---------------
    [junit] NOTE: directory will not be cleaned up automatically...
    [junit] Dir: /tmp/org.apache.lucene.document.TestBigBinary.97856146.100iters.4mb
    [junit] iters completed: 100
    [junit] totalBytes Allocated: 419430400
    [junit] NOTE: directory will not be cleaned up automatically...
    [junit] Dir: /tmp/org.apache.lucene.document.TestBigBinary.97856146.100iters.5mb
    [junit] iters completed: 9
    [junit] totalBytes Allocated: 52428800
    [junit] ------------- ---------------- ---------------
    [junit] Testcase: testBigBinaryFields(org.apache.lucene.document.TestBigBinary):    Caused an ERROR
    [junit] Java heap space
    [junit] java.lang.OutOfMemoryError: Java heap space


    [junit] Test org.apache.lucene.document.TestBigBinary FAILED


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-488) adding docs with large (binary) fields of 5mb causes OOM regardless of heap size

Posted by "Daniel Naber (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/LUCENE-488?page=comments#action_12363524 ] 

Daniel Naber commented on LUCENE-488:
-------------------------------------

writer.setMaxBufferedDocs(5); solves to OOM error, at least for binary stuff that's 5MB. So with writer.setMaxBufferedDocs(1) you can probably add documents that are almost as big as your JVM maximum memory I guess.


> adding docs with large (binary) fields of 5mb causes OOM regardless of heap size
> --------------------------------------------------------------------------------
>
>          Key: LUCENE-488
>          URL: http://issues.apache.org/jira/browse/LUCENE-488
>      Project: Lucene - Java
>         Type: Bug
>     Versions: 1.9
>  Environment: Linux asimov 2.6.6.hoss1 #1 SMP Tue Jul 6 16:31:01 PDT 2004 i686 GNU/Linux
>     Reporter: Hoss Man
>  Attachments: TestBigBinary.java
>
> as reported by George Washington in a message to java-user@lucene.apache.org with subect "Storing large text or binary source documents in the index and memory usage" arround 2006-01-21 there seems to be a problem with adding docs containing really large fields.
> I'll attach a test case in a moment, note that (for me) regardless of how big i make my heap size, and regardless of what value I set  MIN_MB to, once it starts trying to make documents of containing 5mb of data, it can only add 9 before it rolls over and dies.
> here's the output from the code as i will attach in a moment...
>     [junit] Testsuite: org.apache.lucene.document.TestBigBinary
>     [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 78.656 sec
>     [junit] ------------- Standard Output ---------------
>     [junit] NOTE: directory will not be cleaned up automatically...
>     [junit] Dir: /tmp/org.apache.lucene.document.TestBigBinary.97856146.100iters.4mb
>     [junit] iters completed: 100
>     [junit] totalBytes Allocated: 419430400
>     [junit] NOTE: directory will not be cleaned up automatically...
>     [junit] Dir: /tmp/org.apache.lucene.document.TestBigBinary.97856146.100iters.5mb
>     [junit] iters completed: 9
>     [junit] totalBytes Allocated: 52428800
>     [junit] ------------- ---------------- ---------------
>     [junit] Testcase: testBigBinaryFields(org.apache.lucene.document.TestBigBinary):    Caused an ERROR
>     [junit] Java heap space
>     [junit] java.lang.OutOfMemoryError: Java heap space
>     [junit] Test org.apache.lucene.document.TestBigBinary FAILED

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Resolved: (LUCENE-488) adding docs with large (binary) fields of 5mb causes OOM regardless of heap size

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen resolved LUCENE-488.
--------------------------------

    Resolution: Fixed

This problem was resolved by LUCENE-843, after which stored fields are written directly into the directory (therefore not consuming aggregated RAM). 

It is interesting that the test provided here was allocating a new byte buffer of 2 - 10 MB for each added doc. This by itself couldeventually lead to OOMs because as the program ran longer it was becoming harder to alocate consecutive chunks of those sizes.  Enhancing binary fields with offset and length (?)  would allow applications to reuse the input byte array and allocate less of those.

> adding docs with large (binary) fields of 5mb causes OOM regardless of heap size
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-488
>                 URL: https://issues.apache.org/jira/browse/LUCENE-488
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 1.9
>         Environment: Linux asimov 2.6.6.hoss1 #1 SMP Tue Jul 6 16:31:01 PDT 2004 i686 GNU/Linux
>            Reporter: Hoss Man
>         Attachments: TestBigBinary.java
>
>
> as reported by George Washington in a message to java-user@lucene.apache.org with subect "Storing large text or binary source documents in the index and memory usage" arround 2006-01-21 there seems to be a problem with adding docs containing really large fields.
> I'll attach a test case in a moment, note that (for me) regardless of how big i make my heap size, and regardless of what value I set  MIN_MB to, once it starts trying to make documents of containing 5mb of data, it can only add 9 before it rolls over and dies.
> here's the output from the code as i will attach in a moment...
>     [junit] Testsuite: org.apache.lucene.document.TestBigBinary
>     [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 78.656 sec
>     [junit] ------------- Standard Output ---------------
>     [junit] NOTE: directory will not be cleaned up automatically...
>     [junit] Dir: /tmp/org.apache.lucene.document.TestBigBinary.97856146.100iters.4mb
>     [junit] iters completed: 100
>     [junit] totalBytes Allocated: 419430400
>     [junit] NOTE: directory will not be cleaned up automatically...
>     [junit] Dir: /tmp/org.apache.lucene.document.TestBigBinary.97856146.100iters.5mb
>     [junit] iters completed: 9
>     [junit] totalBytes Allocated: 52428800
>     [junit] ------------- ---------------- ---------------
>     [junit] Testcase: testBigBinaryFields(org.apache.lucene.document.TestBigBinary):    Caused an ERROR
>     [junit] Java heap space
>     [junit] java.lang.OutOfMemoryError: Java heap space
>     [junit] Test org.apache.lucene.document.TestBigBinary FAILED

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-488) adding docs with large (binary) fields of 5mb causes OOM regardless of heap size

Posted by "Daniel Naber (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/LUCENE-488?page=comments#action_12363568 ] 

Daniel Naber commented on LUCENE-488:
-------------------------------------

writer.setMaxBufferedDocs(1) was a bad idea, it doesn't work because of an off-by-one bug. writer.setMaxBufferedDocs(2) should work, but I had to stop the unit test because it's too slow because of the many disk accesses. Other things to try:

-get stack trace of OOM (requires java 1.5)
-use writer.setUseCompoundFile(false) and look at the index directory after the crash
-use writer.setInfoStream(System.out) to get some (not much) more output from Lucene

BTW, this seems to affect all big stored fields, not just binary fields.

(Please reply here in the issue tracker, not on the mailing list. This way things can be properly tracked).


> adding docs with large (binary) fields of 5mb causes OOM regardless of heap size
> --------------------------------------------------------------------------------
>
>          Key: LUCENE-488
>          URL: http://issues.apache.org/jira/browse/LUCENE-488
>      Project: Lucene - Java
>         Type: Bug
>     Versions: 1.9
>  Environment: Linux asimov 2.6.6.hoss1 #1 SMP Tue Jul 6 16:31:01 PDT 2004 i686 GNU/Linux
>     Reporter: Hoss Man
>  Attachments: TestBigBinary.java
>
> as reported by George Washington in a message to java-user@lucene.apache.org with subect "Storing large text or binary source documents in the index and memory usage" arround 2006-01-21 there seems to be a problem with adding docs containing really large fields.
> I'll attach a test case in a moment, note that (for me) regardless of how big i make my heap size, and regardless of what value I set  MIN_MB to, once it starts trying to make documents of containing 5mb of data, it can only add 9 before it rolls over and dies.
> here's the output from the code as i will attach in a moment...
>     [junit] Testsuite: org.apache.lucene.document.TestBigBinary
>     [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 78.656 sec
>     [junit] ------------- Standard Output ---------------
>     [junit] NOTE: directory will not be cleaned up automatically...
>     [junit] Dir: /tmp/org.apache.lucene.document.TestBigBinary.97856146.100iters.4mb
>     [junit] iters completed: 100
>     [junit] totalBytes Allocated: 419430400
>     [junit] NOTE: directory will not be cleaned up automatically...
>     [junit] Dir: /tmp/org.apache.lucene.document.TestBigBinary.97856146.100iters.5mb
>     [junit] iters completed: 9
>     [junit] totalBytes Allocated: 52428800
>     [junit] ------------- ---------------- ---------------
>     [junit] Testcase: testBigBinaryFields(org.apache.lucene.document.TestBigBinary):    Caused an ERROR
>     [junit] Java heap space
>     [junit] java.lang.OutOfMemoryError: Java heap space
>     [junit] Test org.apache.lucene.document.TestBigBinary FAILED

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-488) adding docs with large (binary) fields of 5mb causes OOM regardless of heap size

Posted by "george washington (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/LUCENE-488?page=comments#action_12363628 ] 

george washington commented on LUCENE-488:
------------------------------------------

Daniel, a combination of :

      iwriter.setMaxBufferedDocs(2);
      iwriter.setMergeFactor(2);
      iwriter.setUseCompoundFile(false);

seems to help. I still get OOM errors but only with several larger docs (>10MB),  in succession, a significant improvement from the 5MB docs limit.
Perhaps this issue should be kept open so that a more satisfactory solution is found.
Thank you for your help.


> adding docs with large (binary) fields of 5mb causes OOM regardless of heap size
> --------------------------------------------------------------------------------
>
>          Key: LUCENE-488
>          URL: http://issues.apache.org/jira/browse/LUCENE-488
>      Project: Lucene - Java
>         Type: Bug
>     Versions: 1.9
>  Environment: Linux asimov 2.6.6.hoss1 #1 SMP Tue Jul 6 16:31:01 PDT 2004 i686 GNU/Linux
>     Reporter: Hoss Man
>  Attachments: TestBigBinary.java
>
> as reported by George Washington in a message to java-user@lucene.apache.org with subect "Storing large text or binary source documents in the index and memory usage" arround 2006-01-21 there seems to be a problem with adding docs containing really large fields.
> I'll attach a test case in a moment, note that (for me) regardless of how big i make my heap size, and regardless of what value I set  MIN_MB to, once it starts trying to make documents of containing 5mb of data, it can only add 9 before it rolls over and dies.
> here's the output from the code as i will attach in a moment...
>     [junit] Testsuite: org.apache.lucene.document.TestBigBinary
>     [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 78.656 sec
>     [junit] ------------- Standard Output ---------------
>     [junit] NOTE: directory will not be cleaned up automatically...
>     [junit] Dir: /tmp/org.apache.lucene.document.TestBigBinary.97856146.100iters.4mb
>     [junit] iters completed: 100
>     [junit] totalBytes Allocated: 419430400
>     [junit] NOTE: directory will not be cleaned up automatically...
>     [junit] Dir: /tmp/org.apache.lucene.document.TestBigBinary.97856146.100iters.5mb
>     [junit] iters completed: 9
>     [junit] totalBytes Allocated: 52428800
>     [junit] ------------- ---------------- ---------------
>     [junit] Testcase: testBigBinaryFields(org.apache.lucene.document.TestBigBinary):    Caused an ERROR
>     [junit] Java heap space
>     [junit] java.lang.OutOfMemoryError: Java heap space
>     [junit] Test org.apache.lucene.document.TestBigBinary FAILED

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-488) adding docs with large (binary) fields of 5mb causes OOM regardless of heap size

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/LUCENE-488?page=all ]

Hoss Man updated LUCENE-488:
----------------------------

    Attachment: TestBigBinary.java

two things i forgot to mention before...

1) It seems i can as many 4mb documents as my heart desires, but once i go up to 5 all hell breaks loose.

2) I didn't try playing with the various IndexWriter options to see what affect they had on the breaking point.


> adding docs with large (binary) fields of 5mb causes OOM regardless of heap size
> --------------------------------------------------------------------------------
>
>          Key: LUCENE-488
>          URL: http://issues.apache.org/jira/browse/LUCENE-488
>      Project: Lucene - Java
>         Type: Bug
>     Versions: 1.9
>  Environment: Linux asimov 2.6.6.hoss1 #1 SMP Tue Jul 6 16:31:01 PDT 2004 i686 GNU/Linux
>     Reporter: Hoss Man
>  Attachments: TestBigBinary.java
>
> as reported by George Washington in a message to java-user@lucene.apache.org with subect "Storing large text or binary source documents in the index and memory usage" arround 2006-01-21 there seems to be a problem with adding docs containing really large fields.
> I'll attach a test case in a moment, note that (for me) regardless of how big i make my heap size, and regardless of what value I set  MIN_MB to, once it starts trying to make documents of containing 5mb of data, it can only add 9 before it rolls over and dies.
> here's the output from the code as i will attach in a moment...
>     [junit] Testsuite: org.apache.lucene.document.TestBigBinary
>     [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 78.656 sec
>     [junit] ------------- Standard Output ---------------
>     [junit] NOTE: directory will not be cleaned up automatically...
>     [junit] Dir: /tmp/org.apache.lucene.document.TestBigBinary.97856146.100iters.4mb
>     [junit] iters completed: 100
>     [junit] totalBytes Allocated: 419430400
>     [junit] NOTE: directory will not be cleaned up automatically...
>     [junit] Dir: /tmp/org.apache.lucene.document.TestBigBinary.97856146.100iters.5mb
>     [junit] iters completed: 9
>     [junit] totalBytes Allocated: 52428800
>     [junit] ------------- ---------------- ---------------
>     [junit] Testcase: testBigBinaryFields(org.apache.lucene.document.TestBigBinary):    Caused an ERROR
>     [junit] Java heap space
>     [junit] java.lang.OutOfMemoryError: Java heap space
>     [junit] Test org.apache.lucene.document.TestBigBinary FAILED

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org