You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by stefan <st...@intermediate.de> on 2009/06/24 10:08:43 UTC
OutOfMemoryError using IndexWriter
Hi,
I am using Lucene 2.4.1 to index a database with less than a million records. The resulting index is about 50MB in size.
I keep getting an OutOfMemory Error if I re-use the same IndexWriter to index the complete database. This is though
recommended in the performance hints.
What I now do is, every 10000 Objects I close the index (and every 50 close actions optimize it) and create a new
IndexWriter to continue. This process works fine, but to me seems hardly the recommended way to go.
I've been using jhat/jmap as well as Netbeans profiler and am fairly sure that this is a problem related to Lucene.
Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory postings but they all seem closed in Version 2.4.1.
Thanks,
Stefan
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: OutOfMemoryError using IndexWriter
Posted by "Sudarsan, Sithu D." <Si...@fda.hhs.gov>.
Hi Stefan,
Generally, I run the memory monitor and see how much of it is being used. The total memory usage by application, is what we see on the monitor. So long as it is less that 1.8GB things are fine.
In your case, you are assuming that the program takes only 50MB, which may be actually very different from the actual usage. In my observations, memory usage by the application rarely caused issues. The only thing that affected us was the Lucene index file(s). Our physical memory is 4GB (even though JVM sees only about 1.8GB).
Sincerely,
Sithu
-----Original Message-----
From: stefan [mailto:stefan@intermediate.de]
Sent: Wednesday, June 24, 2009 10:23 AM
To: java-user@lucene.apache.org
Subject: AW: OutOfMemoryError using IndexWriter
Hi,
does Lucene keep the complete index in memory ?
As stated before the result index is 50MB, this would correlate with the memory footprint required by Lucene as seen in my app:
jvm 120MB - 50MB(Lucene) - 50MB(my App) = something left
jvm 100MB - 50MB(Lucene) - 50MB(my App) = OOError
though some hint, whether this is the case, from the programming side would be appreciated ...
Stefan
-----Ursprüngliche Nachricht-----
Von: Sudarsan, Sithu D. [mailto:Sithu.Sudarsan@fda.hhs.gov]
Gesendet: Mi 24.06.2009 16:18
An: java-user@lucene.apache.org
Betreff: RE: OutOfMemoryError using IndexWriter
When the segments are merged, but not optimized. It happened at 1.8GB to our program, and now we develop and test in Win32 but run the code on Linux, which seems to be handling atleast upto 3GB of index.
In fact, if the index size if beyond 1.8GB even, Luke throws Java Heap Error, if I try to open.
Please post your results/views.
Sincerely,
Sithu
-----Original Message-----
From: stefan [mailto:stefan@intermediate.de]
Sent: Wednesday, June 24, 2009 10:08 AM
To: java-user@lucene.apache.org
Subject: AW: OutOfMemoryError using IndexWriter
Hi,
I do use Win32.
What do you mean by "the index file before
optimizations crosses your jvm memory usage settings (if say 512MB)" ?
Could you please further explain this ?
Stefan
-----Ursprüngliche Nachricht-----
Von: Sudarsan, Sithu D. [mailto:Sithu.Sudarsan@fda.hhs.gov]
Gesendet: Mi 24.06.2009 15:55
An: java-user@lucene.apache.org
Betreff: RE: OutOfMemoryError using IndexWriter
Hi Stefan,
Are you using Windows 32 bit? If so, sometimes, if the index file before
optimizations crosses your jvm memory usage settings (if say 512MB),
there is a possibility of this happening.
Increase JVM memory settings if that is the case.
Sincerely,
Sithu D Sudarsan
Off: 301-796-2587
sithu.sudarsan@fda.hhs.gov
sdsudarsan@ualr.edu
-----Original Message-----
From: stefan [mailto:stefan@intermediate.de]
Sent: Wednesday, June 24, 2009 4:09 AM
To: java-user@lucene.apache.org
Subject: OutOfMemoryError using IndexWriter
Hi,
I am using Lucene 2.4.1 to index a database with less than a million
records. The resulting index is about 50MB in size.
I keep getting an OutOfMemory Error if I re-use the same IndexWriter to
index the complete database. This is though
recommended in the performance hints.
What I now do is, every 10000 Objects I close the index (and every 50
close actions optimize it) and create a new
IndexWriter to continue. This process works fine, but to me seems hardly
the recommended way to go.
I've been using jhat/jmap as well as Netbeans profiler and am fairly
sure that this is a problem related to Lucene.
Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory
postings but they all seem closed in Version 2.4.1.
Thanks,
Stefan
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: OutOfMemoryError using IndexWriter
Posted by Michael McCandless <lu...@mikemccandless.com>.
On Wed, Jun 24, 2009 at 10:23 AM, stefan<st...@intermediate.de> wrote:
> does Lucene keep the complete index in memory ?
No.
Certain things (deleted docs, norms, field cache, terms index) are
loaded into memory, but these are tiny compared to what's not loaded
into memory (postings, stored docs, term vectors).
> As stated before the result index is 50MB, this would correlate with the memory footprint required by Lucene as seen in my app:
> jvm 120MB - 50MB(Lucene) - 50MB(my App) = something left
> jvm 100MB - 50MB(Lucene) - 50MB(my App) = OOError
I think this is just coincidence.
MIke
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
AW: OutOfMemoryError using IndexWriter
Posted by stefan <st...@intermediate.de>.
Hi,
does Lucene keep the complete index in memory ?
As stated before the result index is 50MB, this would correlate with the memory footprint required by Lucene as seen in my app:
jvm 120MB - 50MB(Lucene) - 50MB(my App) = something left
jvm 100MB - 50MB(Lucene) - 50MB(my App) = OOError
though some hint, whether this is the case, from the programming side would be appreciated ...
Stefan
-----Urspr�ngliche Nachricht-----
Von: Sudarsan, Sithu D. [mailto:Sithu.Sudarsan@fda.hhs.gov]
Gesendet: Mi 24.06.2009 16:18
An: java-user@lucene.apache.org
Betreff: RE: OutOfMemoryError using IndexWriter
When the segments are merged, but not optimized. It happened at 1.8GB to our program, and now we develop and test in Win32 but run the code on Linux, which seems to be handling atleast upto 3GB of index.
In fact, if the index size if beyond 1.8GB even, Luke throws Java Heap Error, if I try to open.
Please post your results/views.
Sincerely,
Sithu
-----Original Message-----
From: stefan [mailto:stefan@intermediate.de]
Sent: Wednesday, June 24, 2009 10:08 AM
To: java-user@lucene.apache.org
Subject: AW: OutOfMemoryError using IndexWriter
Hi,
I do use Win32.
What do you mean by "the index file before
optimizations crosses your jvm memory usage settings (if say 512MB)" ?
Could you please further explain this ?
Stefan
-----Urspr�ngliche Nachricht-----
Von: Sudarsan, Sithu D. [mailto:Sithu.Sudarsan@fda.hhs.gov]
Gesendet: Mi 24.06.2009 15:55
An: java-user@lucene.apache.org
Betreff: RE: OutOfMemoryError using IndexWriter
Hi Stefan,
Are you using Windows 32 bit? If so, sometimes, if the index file before
optimizations crosses your jvm memory usage settings (if say 512MB),
there is a possibility of this happening.
Increase JVM memory settings if that is the case.
Sincerely,
Sithu D Sudarsan
Off: 301-796-2587
sithu.sudarsan@fda.hhs.gov
sdsudarsan@ualr.edu
-----Original Message-----
From: stefan [mailto:stefan@intermediate.de]
Sent: Wednesday, June 24, 2009 4:09 AM
To: java-user@lucene.apache.org
Subject: OutOfMemoryError using IndexWriter
Hi,
I am using Lucene 2.4.1 to index a database with less than a million
records. The resulting index is about 50MB in size.
I keep getting an OutOfMemory Error if I re-use the same IndexWriter to
index the complete database. This is though
recommended in the performance hints.
What I now do is, every 10000 Objects I close the index (and every 50
close actions optimize it) and create a new
IndexWriter to continue. This process works fine, but to me seems hardly
the recommended way to go.
I've been using jhat/jmap as well as Netbeans profiler and am fairly
sure that this is a problem related to Lucene.
Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory
postings but they all seem closed in Version 2.4.1.
Thanks,
Stefan
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: OutOfMemoryError using IndexWriter
Posted by "Sudarsan, Sithu D." <Si...@fda.hhs.gov>.
When the segments are merged, but not optimized. It happened at 1.8GB to our program, and now we develop and test in Win32 but run the code on Linux, which seems to be handling atleast upto 3GB of index.
In fact, if the index size if beyond 1.8GB even, Luke throws Java Heap Error, if I try to open.
Please post your results/views.
Sincerely,
Sithu
-----Original Message-----
From: stefan [mailto:stefan@intermediate.de]
Sent: Wednesday, June 24, 2009 10:08 AM
To: java-user@lucene.apache.org
Subject: AW: OutOfMemoryError using IndexWriter
Hi,
I do use Win32.
What do you mean by "the index file before
optimizations crosses your jvm memory usage settings (if say 512MB)" ?
Could you please further explain this ?
Stefan
-----Ursprüngliche Nachricht-----
Von: Sudarsan, Sithu D. [mailto:Sithu.Sudarsan@fda.hhs.gov]
Gesendet: Mi 24.06.2009 15:55
An: java-user@lucene.apache.org
Betreff: RE: OutOfMemoryError using IndexWriter
Hi Stefan,
Are you using Windows 32 bit? If so, sometimes, if the index file before
optimizations crosses your jvm memory usage settings (if say 512MB),
there is a possibility of this happening.
Increase JVM memory settings if that is the case.
Sincerely,
Sithu D Sudarsan
Off: 301-796-2587
sithu.sudarsan@fda.hhs.gov
sdsudarsan@ualr.edu
-----Original Message-----
From: stefan [mailto:stefan@intermediate.de]
Sent: Wednesday, June 24, 2009 4:09 AM
To: java-user@lucene.apache.org
Subject: OutOfMemoryError using IndexWriter
Hi,
I am using Lucene 2.4.1 to index a database with less than a million
records. The resulting index is about 50MB in size.
I keep getting an OutOfMemory Error if I re-use the same IndexWriter to
index the complete database. This is though
recommended in the performance hints.
What I now do is, every 10000 Objects I close the index (and every 50
close actions optimize it) and create a new
IndexWriter to continue. This process works fine, but to me seems hardly
the recommended way to go.
I've been using jhat/jmap as well as Netbeans profiler and am fairly
sure that this is a problem related to Lucene.
Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory
postings but they all seem closed in Version 2.4.1.
Thanks,
Stefan
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
AW: OutOfMemoryError using IndexWriter
Posted by stefan <st...@intermediate.de>.
Hi,
I do use Win32.
What do you mean by "the index file before
optimizations crosses your jvm memory usage settings (if say 512MB)" ?
Could you please further explain this ?
Stefan
-----Urspr�ngliche Nachricht-----
Von: Sudarsan, Sithu D. [mailto:Sithu.Sudarsan@fda.hhs.gov]
Gesendet: Mi 24.06.2009 15:55
An: java-user@lucene.apache.org
Betreff: RE: OutOfMemoryError using IndexWriter
Hi Stefan,
Are you using Windows 32 bit? If so, sometimes, if the index file before
optimizations crosses your jvm memory usage settings (if say 512MB),
there is a possibility of this happening.
Increase JVM memory settings if that is the case.
Sincerely,
Sithu D Sudarsan
Off: 301-796-2587
sithu.sudarsan@fda.hhs.gov
sdsudarsan@ualr.edu
-----Original Message-----
From: stefan [mailto:stefan@intermediate.de]
Sent: Wednesday, June 24, 2009 4:09 AM
To: java-user@lucene.apache.org
Subject: OutOfMemoryError using IndexWriter
Hi,
I am using Lucene 2.4.1 to index a database with less than a million
records. The resulting index is about 50MB in size.
I keep getting an OutOfMemory Error if I re-use the same IndexWriter to
index the complete database. This is though
recommended in the performance hints.
What I now do is, every 10000 Objects I close the index (and every 50
close actions optimize it) and create a new
IndexWriter to continue. This process works fine, but to me seems hardly
the recommended way to go.
I've been using jhat/jmap as well as Netbeans profiler and am fairly
sure that this is a problem related to Lucene.
Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory
postings but they all seem closed in Version 2.4.1.
Thanks,
Stefan
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: OutOfMemoryError using IndexWriter
Posted by "Sudarsan, Sithu D." <Si...@fda.hhs.gov>.
Hi Stefan,
Are you using Windows 32 bit? If so, sometimes, if the index file before
optimizations crosses your jvm memory usage settings (if say 512MB),
there is a possibility of this happening.
Increase JVM memory settings if that is the case.
Sincerely,
Sithu D Sudarsan
Off: 301-796-2587
sithu.sudarsan@fda.hhs.gov
sdsudarsan@ualr.edu
-----Original Message-----
From: stefan [mailto:stefan@intermediate.de]
Sent: Wednesday, June 24, 2009 4:09 AM
To: java-user@lucene.apache.org
Subject: OutOfMemoryError using IndexWriter
Hi,
I am using Lucene 2.4.1 to index a database with less than a million
records. The resulting index is about 50MB in size.
I keep getting an OutOfMemory Error if I re-use the same IndexWriter to
index the complete database. This is though
recommended in the performance hints.
What I now do is, every 10000 Objects I close the index (and every 50
close actions optimize it) and create a new
IndexWriter to continue. This process works fine, but to me seems hardly
the recommended way to go.
I've been using jhat/jmap as well as Netbeans profiler and am fairly
sure that this is a problem related to Lucene.
Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory
postings but they all seem closed in Version 2.4.1.
Thanks,
Stefan
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: OutOfMemoryError using IndexWriter
Posted by Michael McCandless <lu...@mikemccandless.com>.
OK it looks like no merging was done.
I think the next step is to call
IndexWriter.setMaxBufferedDeleteTerms(1000) and see if that prevents
the OOM.
Mike
On Thu, Jun 25, 2009 at 7:16 AM, stefan<st...@intermediate.de> wrote:
> Hi,
>
> Here are the result of CheckIndex. I ran this just after I got the OOError.
>
> OK [4 fields]
> test: terms, freq, prox...OK [509534 terms; 9126904 terms/docs pairs; 4933036 tokens]
> test: stored fields.......OK [148124 total field count; avg 2 fields per doc]
> test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
>
> 2 of 7: name=_b docCount=17724
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=4,514
> docStoreOffset=0
> docStoreSegment=_b
> docStoreIsCompoundFile=true
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [4 fields]
> test: terms, freq, prox...OK [122054 terms; 1022477 terms/docs pairs; 1560703 tokens]
> test: stored fields.......OK [35448 total field count; avg 2 fields per doc]
> test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
>
> 3 of 7: name=_c docCount=15952
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=4,539
> docStoreOffset=17724
> docStoreSegment=_b
> docStoreIsCompoundFile=true
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [4 fields]
> test: terms, freq, prox...OK [125512 terms; 1047363 terms/docs pairs; 1535701 tokens]
> test: stored fields.......OK [31904 total field count; avg 2 fields per doc]
> test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
>
> 4 of 7: name=_d docCount=19975
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=5,547
> docStoreOffset=33676
> docStoreSegment=_b
> docStoreIsCompoundFile=true
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [4 fields]
> test: terms, freq, prox...OK [101563 terms; 1327972 terms/docs pairs; 2390213 tokens]
> test: stored fields.......OK [39950 total field count; avg 2 fields per doc]
> test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
>
> 5 of 7: name=_e docCount=24740
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=5,458
> docStoreOffset=53651
> docStoreSegment=_b
> docStoreIsCompoundFile=true
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [4 fields]
> test: terms, freq, prox...OK [94791 terms; 1290085 terms/docs pairs; 2501794 tokens]
> test: stored fields.......OK [49480 total field count; avg 2 fields per doc]
> test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
>
> 6 of 7: name=_f docCount=21584
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=5,914
> docStoreOffset=0
> docStoreSegment=_f
> docStoreIsCompoundFile=true
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [4 fields]
> test: terms, freq, prox...OK [92162 terms; 1267882 terms/docs pairs; 2570682 tokens]
> test: stored fields.......OK [43168 total field count; avg 2 fields per doc]
> test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
>
> 7 of 7: name=_g docCount=13600
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1,664
> docStoreOffset=21584
> docStoreSegment=_f
> docStoreIsCompoundFile=true
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [4 fields]
> test: terms, freq, prox...OK [42087 terms; 326152 terms/docs pairs; 667302 tokens]
> test: stored fields.......OK [27200 total field count; avg 2 fields per doc]
> test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
>
> No problems were detected with this index.
>
>
> Stefan
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
AW: OutOfMemoryError using IndexWriter
Posted by stefan <st...@intermediate.de>.
Hi,
Here are the result of CheckIndex. I ran this just after I got the OOError.
OK [4 fields]
test: terms, freq, prox...OK [509534 terms; 9126904 terms/docs pairs; 4933036 tokens]
test: stored fields.......OK [148124 total field count; avg 2 fields per doc]
test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
2 of 7: name=_b docCount=17724
compound=true
hasProx=true
numFiles=2
size (MB)=4,514
docStoreOffset=0
docStoreSegment=_b
docStoreIsCompoundFile=true
no deletions
test: open reader.........OK
test: fields, norms.......OK [4 fields]
test: terms, freq, prox...OK [122054 terms; 1022477 terms/docs pairs; 1560703 tokens]
test: stored fields.......OK [35448 total field count; avg 2 fields per doc]
test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
3 of 7: name=_c docCount=15952
compound=true
hasProx=true
numFiles=2
size (MB)=4,539
docStoreOffset=17724
docStoreSegment=_b
docStoreIsCompoundFile=true
no deletions
test: open reader.........OK
test: fields, norms.......OK [4 fields]
test: terms, freq, prox...OK [125512 terms; 1047363 terms/docs pairs; 1535701 tokens]
test: stored fields.......OK [31904 total field count; avg 2 fields per doc]
test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
4 of 7: name=_d docCount=19975
compound=true
hasProx=true
numFiles=2
size (MB)=5,547
docStoreOffset=33676
docStoreSegment=_b
docStoreIsCompoundFile=true
no deletions
test: open reader.........OK
test: fields, norms.......OK [4 fields]
test: terms, freq, prox...OK [101563 terms; 1327972 terms/docs pairs; 2390213 tokens]
test: stored fields.......OK [39950 total field count; avg 2 fields per doc]
test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
5 of 7: name=_e docCount=24740
compound=true
hasProx=true
numFiles=2
size (MB)=5,458
docStoreOffset=53651
docStoreSegment=_b
docStoreIsCompoundFile=true
no deletions
test: open reader.........OK
test: fields, norms.......OK [4 fields]
test: terms, freq, prox...OK [94791 terms; 1290085 terms/docs pairs; 2501794 tokens]
test: stored fields.......OK [49480 total field count; avg 2 fields per doc]
test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
6 of 7: name=_f docCount=21584
compound=true
hasProx=true
numFiles=2
size (MB)=5,914
docStoreOffset=0
docStoreSegment=_f
docStoreIsCompoundFile=true
no deletions
test: open reader.........OK
test: fields, norms.......OK [4 fields]
test: terms, freq, prox...OK [92162 terms; 1267882 terms/docs pairs; 2570682 tokens]
test: stored fields.......OK [43168 total field count; avg 2 fields per doc]
test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
7 of 7: name=_g docCount=13600
compound=true
hasProx=true
numFiles=2
size (MB)=1,664
docStoreOffset=21584
docStoreSegment=_f
docStoreIsCompoundFile=true
no deletions
test: open reader.........OK
test: fields, norms.......OK [4 fields]
test: terms, freq, prox...OK [42087 terms; 326152 terms/docs pairs; 667302 tokens]
test: stored fields.......OK [27200 total field count; avg 2 fields per doc]
test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
No problems were detected with this index.
Stefan
Re: OutOfMemoryError using IndexWriter
Posted by Michael McCandless <lu...@mikemccandless.com>.
On Thu, Jun 25, 2009 at 3:02 AM, stefan<st...@intermediate.de> wrote:
>>But a "leak" would keep leaking over time, right? Ie even a 1 GB heap
>>on your test db should eventually throw OOME if there's really a leak.
> No not necessarily, since I stop indexing ones everything is indexed - I shall try repeated runs with 120MB.
It indeed looks like IndexWriter doens't account for RAM used by
pending deletes. I've opened
https://issues.apache.org/jira/browse/LUCENE-1717 for this. Though
I'd normally expect the amount of extra RAM used to be smallish...
Do you have a high merge factor? Can you run CheckIndex on your index
(java org.apache.lucene.index.CheckIndex /path/to/index) and post the
output back?
Currently IndexWriter will flush these deletes on kicking off a merge,
or if commit() is called, so one workaround you could try is to call
commit() every so often and see if that improves the RAM usage?
>>Are you calling updateDocument (which deletes then adds)?
> Yes I do, I do not know in my code whether the document is already indexed or not. In my test case I do delete the
> complete index before the run, so all documents should be new to the index. I still use update though, since I
> this piece of code is generic.
OK that's a good reason to use updateDocument.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
AW: OutOfMemoryError using IndexWriter
Posted by stefan <st...@intermediate.de>.
Hi,
>But a "leak" would keep leaking over time, right? Ie even a 1 GB heap
>on your test db should eventually throw OOME if there's really a leak.
No not necessarily, since I stop indexing ones everything is indexed - I shall try repeated runs with 120MB.
>Are you calling updateDocument (which deletes then adds)?
Yes I do, I do not know in my code whether the document is already indexed or not. In my test case I do delete the
complete index before the run, so all documents should be new to the index. I still use update though, since I
this piece of code is generic.
I shall post my results.
Stefan
Re: OutOfMemoryError using IndexWriter
Posted by Simon Willnauer <si...@googlemail.com>.
On Thu, Jun 25, 2009 at 1:13 PM, Michael
McCandless<lu...@mikemccandless.com> wrote:
> Can you post your test code? If you can make it a standalone test,
> then I can repro and dig down faster.
>
> Can you try calling IndexWriter.setMaxBufferedDeleteTerms (eg, 1000)
> and see if that prevents the OOM?
>
> Mike
>
> On Thu, Jun 25, 2009 at 7:10 AM, stefan<st...@intermediate.de> wrote:
>>
>> Hi Mike,
>>
>> I just changed my test-code to run in an indefinite loop over the database to index everything. Set the jvm to 120MB heap size, all other parameters as before.
>> I got an OOError just as before - so I would say there is a leak somewhere.
>>
>> Here is the histogram.
>>
>> Heap Histogram
>>
>> All Classes (excluding platform)
>> Class Instance Count Total Size
>> class [B 1809102 41992326
>> class [C 200610 26877068
>> class [[B 46117 9473872
>> class java.lang.String 198629 3178064
>> class org.apache.lucene.index.FreqProxTermsWriter$PostingList 100927 2825956
>> class [Ljava.util.HashMap$Entry; 11329 2494312
>> class java.util.HashMap$Entry 132578 2121248
>> class [I 5186 2097300
I would be interested what happens to DocumentsWriter#deletesInRam can
you check if this keeps on growing even after flushes?!
simon
>>
>> So far I had no success in pinpointing those binary arrays, I will need some more time for this.
>>
>> Stefan
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Gesendet: Mi 24.06.2009 17:50
>> An: java-user@lucene.apache.org
>> Betreff: Re: OutOfMemoryError using IndexWriter
>>
>> On Wed, Jun 24, 2009 at 10:18 AM, stefan<st...@intermediate.de> wrote:
>>>
>>> Hi,
>>>
>>>
>>>>OK so this means it's not a leak, and instead it's just that stuff is
>>>>consuming more RAM than expected.
>>> Or that my test db is smaller than the production db which is indeed the case.
>>
>> But a "leak" would keep leaking over time, right? Ie even a 1 GB heap
>> on your test db should eventually throw OOME if there's really a leak.
>>
>>> Please explain those buffered deletes in a few more details.
>>
>> Are you calling updateDocument (which deletes then adds)?
>>
>> Deletes (the Term or Query you pass to updateDocument or
>> deleteDocuments) are buffered in a HashMap and then that buffer is
>> materialized into actual deleted doc IDs when IndexWriter decides to
>> do so. I think IndexWriter isn't properly flushing the deletes when
>> they use too much RAM.
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
AW: OutOfMemoryError using IndexWriter
Posted by stefan <st...@intermediate.de>.
Hi Mike,
I ran a burn in test overnight with repeatedly indexing the same db in a loop. I set the heap size to 120MB and called setMaxBufferedDeleteTerms( 1000), I did not call commit and used the same index writer.
This test passed without any errors.
So to wrap this up - I shall call commit every 10000 documents added, the performance impact on calling setMaxBufferedDeleteTerms is quite significant (I guess somewhere between factor 2-10). I'll check on the speed of commit.
Commit seems to be able to free more memory (since it runs with 100MB heap size).
I'll have to give up on tracking down those binary references, I spent too much time on this already.
Thanks a lot for your insights.
Stefan
-----Urspr�ngliche Nachricht-----
Von: Michael McCandless [mailto:lucene@mikemccandless.com]
Gesendet: Do 25.06.2009 15:57
An: java-user@lucene.apache.org
Betreff: Re: OutOfMemoryError using IndexWriter
Interesting that excessive deletes buffering is not your problem...
Even if you can't post the resulting test case, if you can simplify it
& run locally, to rule out anything outside Lucene that's allocating
the byte/char/byte[] arrays, that can help to isolate.
Also, profilers can trace where allocations happened, eg YourKit.
Mike
On Thu, Jun 25, 2009 at 9:08 AM, stefan<st...@intermediate.de> wrote:
> Hi,
>
> I'm afraid my test setup and code this is far too big.
> What I use lucene for is fairly simple. I have a database with about 150 tables, I iterate all tables and create for each row a String representation similar to a toString method containing all database data. This string is then fed together with the primary key to lucene. Full-text search of my db is then possible. Each document in Lucene represents a row in the database.
>
> I tried calling setMaxBufferedDeleteTerms �with 100MB heap size to no avail, but calling commit every 10000 documents does help. I assume a commit is similar to creating a new IndexWriter.
>
> HTH,
>
> Stefan
>
>
>
>
> -----Urspr�ngliche Nachricht-----
> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
> Gesendet: Do 25.06.2009 13:13
> An: java-user@lucene.apache.org
> Betreff: Re: OutOfMemoryError using IndexWriter
>
> Can you post your test code? �If you can make it a standalone test,
> then I can repro and dig down faster.
>
> Can you try calling IndexWriter.setMaxBufferedDeleteTerms (eg, 1000)
> and see if that prevents the OOM?
>
> Mike
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: OutOfMemoryError using IndexWriter
Posted by Michael McCandless <lu...@mikemccandless.com>.
Interesting that excessive deletes buffering is not your problem...
Even if you can't post the resulting test case, if you can simplify it
& run locally, to rule out anything outside Lucene that's allocating
the byte/char/byte[] arrays, that can help to isolate.
Also, profilers can trace where allocations happened, eg YourKit.
Mike
On Thu, Jun 25, 2009 at 9:08 AM, stefan<st...@intermediate.de> wrote:
> Hi,
>
> I'm afraid my test setup and code this is far too big.
> What I use lucene for is fairly simple. I have a database with about 150 tables, I iterate all tables and create for each row a String representation similar to a toString method containing all database data. This string is then fed together with the primary key to lucene. Full-text search of my db is then possible. Each document in Lucene represents a row in the database.
>
> I tried calling setMaxBufferedDeleteTerms with 100MB heap size to no avail, but calling commit every 10000 documents does help. I assume a commit is similar to creating a new IndexWriter.
>
> HTH,
>
> Stefan
>
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
> Gesendet: Do 25.06.2009 13:13
> An: java-user@lucene.apache.org
> Betreff: Re: OutOfMemoryError using IndexWriter
>
> Can you post your test code? If you can make it a standalone test,
> then I can repro and dig down faster.
>
> Can you try calling IndexWriter.setMaxBufferedDeleteTerms (eg, 1000)
> and see if that prevents the OOM?
>
> Mike
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
AW: OutOfMemoryError using IndexWriter
Posted by stefan <st...@intermediate.de>.
Hi,
I'm afraid my test setup and code this is far too big.
What I use lucene for is fairly simple. I have a database with about 150 tables, I iterate all tables and create for each row a String representation similar to a toString method containing all database data. This string is then fed together with the primary key to lucene. Full-text search of my db is then possible. Each document in Lucene represents a row in the database.
I tried calling setMaxBufferedDeleteTerms with 100MB heap size to no avail, but calling commit every 10000 documents does help. I assume a commit is similar to creating a new IndexWriter.
HTH,
Stefan
-----Urspr�ngliche Nachricht-----
Von: Michael McCandless [mailto:lucene@mikemccandless.com]
Gesendet: Do 25.06.2009 13:13
An: java-user@lucene.apache.org
Betreff: Re: OutOfMemoryError using IndexWriter
Can you post your test code? If you can make it a standalone test,
then I can repro and dig down faster.
Can you try calling IndexWriter.setMaxBufferedDeleteTerms (eg, 1000)
and see if that prevents the OOM?
Mike
Re: OutOfMemoryError using IndexWriter
Posted by Michael McCandless <lu...@mikemccandless.com>.
Can you post your test code? If you can make it a standalone test,
then I can repro and dig down faster.
Can you try calling IndexWriter.setMaxBufferedDeleteTerms (eg, 1000)
and see if that prevents the OOM?
Mike
On Thu, Jun 25, 2009 at 7:10 AM, stefan<st...@intermediate.de> wrote:
>
> Hi Mike,
>
> I just changed my test-code to run in an indefinite loop over the database to index everything. Set the jvm to 120MB heap size, all other parameters as before.
> I got an OOError just as before - so I would say there is a leak somewhere.
>
> Here is the histogram.
>
> Heap Histogram
>
> All Classes (excluding platform)
> Class Instance Count Total Size
> class [B 1809102 41992326
> class [C 200610 26877068
> class [[B 46117 9473872
> class java.lang.String 198629 3178064
> class org.apache.lucene.index.FreqProxTermsWriter$PostingList 100927 2825956
> class [Ljava.util.HashMap$Entry; 11329 2494312
> class java.util.HashMap$Entry 132578 2121248
> class [I 5186 2097300
>
> So far I had no success in pinpointing those binary arrays, I will need some more time for this.
>
> Stefan
>
> -----Ursprüngliche Nachricht-----
> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
> Gesendet: Mi 24.06.2009 17:50
> An: java-user@lucene.apache.org
> Betreff: Re: OutOfMemoryError using IndexWriter
>
> On Wed, Jun 24, 2009 at 10:18 AM, stefan<st...@intermediate.de> wrote:
>>
>> Hi,
>>
>>
>>>OK so this means it's not a leak, and instead it's just that stuff is
>>>consuming more RAM than expected.
>> Or that my test db is smaller than the production db which is indeed the case.
>
> But a "leak" would keep leaking over time, right? Ie even a 1 GB heap
> on your test db should eventually throw OOME if there's really a leak.
>
>> Please explain those buffered deletes in a few more details.
>
> Are you calling updateDocument (which deletes then adds)?
>
> Deletes (the Term or Query you pass to updateDocument or
> deleteDocuments) are buffered in a HashMap and then that buffer is
> materialized into actual deleted doc IDs when IndexWriter decides to
> do so. I think IndexWriter isn't properly flushing the deletes when
> they use too much RAM.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
AW: OutOfMemoryError using IndexWriter
Posted by stefan <st...@intermediate.de>.
Hi Mike,
I just changed my test-code to run in an indefinite loop over the database to index everything. Set the jvm to 120MB heap size, all other parameters as before.
I got an OOError just as before - so I would say there is a leak somewhere.
Here is the histogram.
Heap Histogram
All Classes (excluding platform)
Class Instance Count Total Size
class [B 1809102 41992326
class [C 200610 26877068
class [[B 46117 9473872
class java.lang.String 198629 3178064
class org.apache.lucene.index.FreqProxTermsWriter$PostingList 100927 2825956
class [Ljava.util.HashMap$Entry; 11329 2494312
class java.util.HashMap$Entry 132578 2121248
class [I 5186 2097300
So far I had no success in pinpointing those binary arrays, I will need some more time for this.
Stefan
-----Urspr�ngliche Nachricht-----
Von: Michael McCandless [mailto:lucene@mikemccandless.com]
Gesendet: Mi 24.06.2009 17:50
An: java-user@lucene.apache.org
Betreff: Re: OutOfMemoryError using IndexWriter
On Wed, Jun 24, 2009 at 10:18 AM, stefan<st...@intermediate.de> wrote:
>
> Hi,
>
>
>>OK so this means it's not a leak, and instead it's just that stuff is
>>consuming more RAM than expected.
> Or that my test db is smaller than the production db which is indeed the case.
But a "leak" would keep leaking over time, right? Ie even a 1 GB heap
on your test db should eventually throw OOME if there's really a leak.
> Please explain those buffered deletes in a few more details.
Are you calling updateDocument (which deletes then adds)?
Deletes (the Term or Query you pass to updateDocument or
deleteDocuments) are buffered in a HashMap and then that buffer is
materialized into actual deleted doc IDs when IndexWriter decides to
do so. I think IndexWriter isn't properly flushing the deletes when
they use too much RAM.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: OutOfMemoryError using IndexWriter
Posted by Michael McCandless <lu...@mikemccandless.com>.
On Wed, Jun 24, 2009 at 10:18 AM, stefan<st...@intermediate.de> wrote:
>
> Hi,
>
>
>>OK so this means it's not a leak, and instead it's just that stuff is
>>consuming more RAM than expected.
> Or that my test db is smaller than the production db which is indeed the case.
But a "leak" would keep leaking over time, right? Ie even a 1 GB heap
on your test db should eventually throw OOME if there's really a leak.
> Please explain those buffered deletes in a few more details.
Are you calling updateDocument (which deletes then adds)?
Deletes (the Term or Query you pass to updateDocument or
deleteDocuments) are buffered in a HashMap and then that buffer is
materialized into actual deleted doc IDs when IndexWriter decides to
do so. I think IndexWriter isn't properly flushing the deletes when
they use too much RAM.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
AW: OutOfMemoryError using IndexWriter
Posted by stefan <st...@intermediate.de>.
Hi,
>OK so this means it's not a leak, and instead it's just that stuff is
>consuming more RAM than expected.
Or that my test db is smaller than the production db which is indeed the case.
>Hmm -- there are quite a few buffered deletes pending. It could be we
>are under-accounting for RAM used by buffered deletes. I'll dig on
>that.
That sounds promising to me, how does a delete happen, we are talking of complete new re-indexing, no deletes at all should be happening. Are you saying that I remove docs from the index ?
>Also, your char[]'s are taking ~30 MB, byte[] ~26MB, which is odd if
>your RAM buffer is 16MB. Does your app create these?
A fair amount is created by my app - I histogram without indexing shows about 10MB chars created by my App:
class [C 132399 9457148,
though a few more could be created during indexing.
> Why is it, that creating a new Index Writer will let the indexing run fine with 80MB, but keeping it will create an
> OutOfMemoryException running with 100MB heap size ?
Please explain those buffered deletes in a few more details.
Thanks,
Stefan
Re: OutOfMemoryError using IndexWriter
Posted by Michael McCandless <lu...@mikemccandless.com>.
On Wed, Jun 24, 2009 at 7:43 AM, stefan<st...@intermediate.de> wrote:
> I tried with 100MB heap size and got the Error as well, it runs fine with 120MB.
OK so this means it's not a leak, and instead it's just that stuff is
consuming more RAM than expected.
> Here is the histogram (application classes marked with --)
>
> Heap Histogram
>
> All Classes (excluding platform)
> Class Instance Count Total Size
> class [C 234200 30245722
> class [B 1087565 25999145
> class [[B 28430 4890060
> class java.lang.String 232351 3717616
> class org.apache.lucene.index.FreqProxTermsWriter$PostingList 99584 2788352
> class java.util.HashMap$Entry 171031 2736496
> class [Ljava.util.HashMap$Entry; 9563 2371256
> class [Ljava.lang.Object; 31820 1820224
> class --- 4474 1753808
> class [I 4337 1567796
> class java.lang.reflect.Method 19774 1364406
> class org.apache.lucene.index.Term 117982 943856
> class [Lorg.apache.lucene.index.RawPostingList; 12 770012
> class --- 1837 490479
> class org.apache.lucene.index.BufferedDeletes$Num 117303 469212
>
> The --- as well was the reflect.Method are part of the app's data.
Hmm -- there are quite a few buffered deletes pending. It could be we
are under-accounting for RAM used by buffered deletes. I'll dig on
that.
Also, your char[]'s are taking ~30 MB, byte[] ~26MB, which is odd if
your RAM buffer is 16MB. Does your app create these?
> Why is it, that creating a new Index Writer will let the indexing run fine with 80MB, but keeping it will create an
> OutOfMemoryException running with 100MB heap size ?
Could be because you are buffering so many deletes (if indeed Lucene
doesn't account for that RAM consumption properly)...
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
AW: OutOfMemoryError using IndexWriter
Posted by stefan <st...@intermediate.de>.
Hi,
I tried with 100MB heap size and got the Error as well, it runs fine with 120MB.
Here is the histogram (application classes marked with --)
Heap Histogram
All Classes (excluding platform)
Class Instance Count Total Size
class [C 234200 30245722
class [B 1087565 25999145
class [[B 28430 4890060
class java.lang.String 232351 3717616
class org.apache.lucene.index.FreqProxTermsWriter$PostingList 99584 2788352
class java.util.HashMap$Entry 171031 2736496
class [Ljava.util.HashMap$Entry; 9563 2371256
class [Ljava.lang.Object; 31820 1820224
class --- 4474 1753808
class [I 4337 1567796
class java.lang.reflect.Method 19774 1364406
class org.apache.lucene.index.Term 117982 943856
class [Lorg.apache.lucene.index.RawPostingList; 12 770012
class --- 1837 490479
class org.apache.lucene.index.BufferedDeletes$Num 117303 469212
The --- as well was the reflect.Method are part of the app's data.
Why is it, that creating a new Index Writer will let the indexing run fine with 80MB, but keeping it will create an
OutOfMemoryException running with 100MB heap size ?
Stefan
-----Urspr�ngliche Nachricht-----
Von: Michael McCandless [mailto:lucene@mikemccandless.com]
Gesendet: Mi 24.06.2009 11:52
An: java-user@lucene.apache.org
Betreff: Re: OutOfMemoryError using IndexWriter
Hmm -- I think your test env (80 MB heap, 50 MB used by app + 16 MB
IndexWriter RAM buffer) is a bit too tight. The 16 MB buffer for IW
is not a hard upper bound on how much RAM it may use. EG when merges
are running, more RAM will be required, if a large doc brought it over
the 16 MB limit it will consume more, etc.
~3 MB used by PostingList is reasonable.
If after fixing the problem in your code, with a larger heap size
you're still running out of RAM, then please post the full histogram
from the resulting heap dump at which point the offender will be
obvious.
Or, can you make the problem happen with a smallish test case?
Mike
On Wed, Jun 24, 2009 at 5:37 AM, stefan<st...@intermediate.de> wrote:
> Hi,
>
> I do not set a RAM Buffer size, I assume default is 16MB.
> My server runs with 80MB heap size, before starting lucene about 50MB is used. In a production environment I run in this problem with heap size set to 750MB with no other activity on the server (nighttime), though since then I diagnosed some problem with my code as well. I just reproduced it with 80MB but I guess I can reproduce it with 100MB heap as well, just takes longer.
>
> Here is the stack, I keep the dump for
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to c:\auto_heap_intern.prof ...
> Heap dump file created [97173841 bytes in 3.534 secs]
> ERROR lucene.SearchManager � � � - Failure in index daemon:
> java.lang.OutOfMemoryError: Java heap space
> � � � �at java.util.HashSet.<init>(HashSet.java:86)
> � � � �at org.apache.lucene.index.DocumentsWriter.initFlushState(DocumentsWriter.java:540)
> � � � �at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
> � � � �at org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1703)
> � � � �at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3534)
> � � � �at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
> � � � �at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
> � � � �at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
> � � � �at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
>
> Heap Histogram shows:
> class org.apache.lucene.index.FreqProxTermsWriter$PostingList � 116736 (instances) � � �3268608 (size)
>
> Well, something I should do differently ?
>
> Stefan
>
> -----Urspr�ngliche Nachricht-----
> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
> Gesendet: Mi 24.06.2009 10:48
> An: java-user@lucene.apache.org
> Betreff: Re: OutOfMemoryError using IndexWriter
>
> How large is the RAM buffer that you're giving IndexWriter? �How large
> a heap size do you give to JVM?
>
> Can you post one of the OOM exceptions you're hitting?
>
> Mike
>
> On Wed, Jun 24, 2009 at 4:08 AM, stefan<st...@intermediate.de> wrote:
>> Hi,
>>
>> I am using Lucene 2.4.1 to index a database with less than a million records. The resulting index is about 50MB in size.
>> I keep getting an OutOfMemory Error if I re-use the same IndexWriter to index the complete database. This is though
>> recommended in the performance hints.
>> What I now do is, every 10000 Objects I close the index (and every 50 close actions optimize it) and create a new
>> IndexWriter to continue. This process works fine, but to me seems hardly the recommended way to go.
>> I've been using jhat/jmap as well as Netbeans profiler and am fairly sure that this is a problem related to Lucene.
>>
>> Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory postings but they all seem closed in Version 2.4.1.
>>
>> Thanks,
>>
>> Stefan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: OutOfMemoryError using IndexWriter
Posted by Michael McCandless <lu...@mikemccandless.com>.
Hmm -- I think your test env (80 MB heap, 50 MB used by app + 16 MB
IndexWriter RAM buffer) is a bit too tight. The 16 MB buffer for IW
is not a hard upper bound on how much RAM it may use. EG when merges
are running, more RAM will be required, if a large doc brought it over
the 16 MB limit it will consume more, etc.
~3 MB used by PostingList is reasonable.
If after fixing the problem in your code, with a larger heap size
you're still running out of RAM, then please post the full histogram
from the resulting heap dump at which point the offender will be
obvious.
Or, can you make the problem happen with a smallish test case?
Mike
On Wed, Jun 24, 2009 at 5:37 AM, stefan<st...@intermediate.de> wrote:
> Hi,
>
> I do not set a RAM Buffer size, I assume default is 16MB.
> My server runs with 80MB heap size, before starting lucene about 50MB is used. In a production environment I run in this problem with heap size set to 750MB with no other activity on the server (nighttime), though since then I diagnosed some problem with my code as well. I just reproduced it with 80MB but I guess I can reproduce it with 100MB heap as well, just takes longer.
>
> Here is the stack, I keep the dump for
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to c:\auto_heap_intern.prof ...
> Heap dump file created [97173841 bytes in 3.534 secs]
> ERROR lucene.SearchManager - Failure in index daemon:
> java.lang.OutOfMemoryError: Java heap space
> at java.util.HashSet.<init>(HashSet.java:86)
> at org.apache.lucene.index.DocumentsWriter.initFlushState(DocumentsWriter.java:540)
> at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
> at org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1703)
> at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3534)
> at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
> at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
> at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
> at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
>
> Heap Histogram shows:
> class org.apache.lucene.index.FreqProxTermsWriter$PostingList 116736 (instances) 3268608 (size)
>
> Well, something I should do differently ?
>
> Stefan
>
> -----Ursprüngliche Nachricht-----
> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
> Gesendet: Mi 24.06.2009 10:48
> An: java-user@lucene.apache.org
> Betreff: Re: OutOfMemoryError using IndexWriter
>
> How large is the RAM buffer that you're giving IndexWriter? How large
> a heap size do you give to JVM?
>
> Can you post one of the OOM exceptions you're hitting?
>
> Mike
>
> On Wed, Jun 24, 2009 at 4:08 AM, stefan<st...@intermediate.de> wrote:
>> Hi,
>>
>> I am using Lucene 2.4.1 to index a database with less than a million records. The resulting index is about 50MB in size.
>> I keep getting an OutOfMemory Error if I re-use the same IndexWriter to index the complete database. This is though
>> recommended in the performance hints.
>> What I now do is, every 10000 Objects I close the index (and every 50 close actions optimize it) and create a new
>> IndexWriter to continue. This process works fine, but to me seems hardly the recommended way to go.
>> I've been using jhat/jmap as well as Netbeans profiler and am fairly sure that this is a problem related to Lucene.
>>
>> Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory postings but they all seem closed in Version 2.4.1.
>>
>> Thanks,
>>
>> Stefan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
AW: OutOfMemoryError using IndexWriter
Posted by stefan <st...@intermediate.de>.
Hi,
I do not set a RAM Buffer size, I assume default is 16MB.
My server runs with 80MB heap size, before starting lucene about 50MB is used. In a production environment I run in this problem with heap size set to 750MB with no other activity on the server (nighttime), though since then I diagnosed some problem with my code as well. I just reproduced it with 80MB but I guess I can reproduce it with 100MB heap as well, just takes longer.
Here is the stack, I keep the dump for
java.lang.OutOfMemoryError: Java heap space
Dumping heap to c:\auto_heap_intern.prof ...
Heap dump file created [97173841 bytes in 3.534 secs]
ERROR lucene.SearchManager - Failure in index daemon:
java.lang.OutOfMemoryError: Java heap space
at java.util.HashSet.<init>(HashSet.java:86)
at org.apache.lucene.index.DocumentsWriter.initFlushState(DocumentsWriter.java:540)
at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
at org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1703)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3534)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
Heap Histogram shows:
class org.apache.lucene.index.FreqProxTermsWriter$PostingList 116736 (instances) 3268608 (size)
Well, something I should do differently ?
Stefan
-----Ursprüngliche Nachricht-----
Von: Michael McCandless [mailto:lucene@mikemccandless.com]
Gesendet: Mi 24.06.2009 10:48
An: java-user@lucene.apache.org
Betreff: Re: OutOfMemoryError using IndexWriter
How large is the RAM buffer that you're giving IndexWriter? How large
a heap size do you give to JVM?
Can you post one of the OOM exceptions you're hitting?
Mike
On Wed, Jun 24, 2009 at 4:08 AM, stefan<st...@intermediate.de> wrote:
> Hi,
>
> I am using Lucene 2.4.1 to index a database with less than a million records. The resulting index is about 50MB in size.
> I keep getting an OutOfMemory Error if I re-use the same IndexWriter to index the complete database. This is though
> recommended in the performance hints.
> What I now do is, every 10000 Objects I close the index (and every 50 close actions optimize it) and create a new
> IndexWriter to continue. This process works fine, but to me seems hardly the recommended way to go.
> I've been using jhat/jmap as well as Netbeans profiler and am fairly sure that this is a problem related to Lucene.
>
> Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory postings but they all seem closed in Version 2.4.1.
>
> Thanks,
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: OutOfMemoryError using IndexWriter
Posted by Michael McCandless <lu...@mikemccandless.com>.
How large is the RAM buffer that you're giving IndexWriter? How large
a heap size do you give to JVM?
Can you post one of the OOM exceptions you're hitting?
Mike
On Wed, Jun 24, 2009 at 4:08 AM, stefan<st...@intermediate.de> wrote:
> Hi,
>
> I am using Lucene 2.4.1 to index a database with less than a million records. The resulting index is about 50MB in size.
> I keep getting an OutOfMemory Error if I re-use the same IndexWriter to index the complete database. This is though
> recommended in the performance hints.
> What I now do is, every 10000 Objects I close the index (and every 50 close actions optimize it) and create a new
> IndexWriter to continue. This process works fine, but to me seems hardly the recommended way to go.
> I've been using jhat/jmap as well as Netbeans profiler and am fairly sure that this is a problem related to Lucene.
>
> Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory postings but they all seem closed in Version 2.4.1.
>
> Thanks,
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
AW: OutOfMemoryError using IndexWriter
Posted by stefan <st...@intermediate.de>.
Hi,
there seems to be a little misunderstanding. The index will only be optimized if the IndexWriter is to be closed and then only with a probability of 2% (meaning occasionaly).
In other words, I only close the IndexWriter (and thus optimize) to avoid the OOMError.
When I keep the same IndexWriter for the complete indexing operation, I do not call optimize but get an OOMError.
Stefan
-----Urspr�ngliche Nachricht-----
Von: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Gesendet: Mi 24.06.2009 14:22
An: java-user@lucene.apache.org
Betreff: Re: OutOfMemoryError using IndexWriter
Hi Stefan,
While not directly th source of your problem, I have a feeling you are optimizing too frequently (and wasting time/CPU by doing so). Is there a reason you optimize so often? Try optimizing only at the end, when you know you won't be adding any more documents to the index for a while.
I'm also wondering why you close/open the IndeWriter, as it looks like you are doing batch indexing.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
> From: stefan <st...@intermediate.de>
> To: java-user@lucene.apache.org
> Sent: Wednesday, June 24, 2009 4:08:43 AM
> Subject: OutOfMemoryError using IndexWriter
>
> Hi,
>
> I am using Lucene 2.4.1 to index a database with less than a million records.
> The resulting index is about 50MB in size.
> I keep getting an OutOfMemory Error if I re-use the same IndexWriter to index
> the complete database. This is though
> recommended in the performance hints.
> What I now do is, every 10000 Objects I close the index (and every 50 close
> actions optimize it) and create a new
> IndexWriter to continue. This process works fine, but to me seems hardly the
> recommended way to go.
> I've been using jhat/jmap as well as Netbeans profiler and am fairly sure that
> this is a problem related to Lucene.
>
> Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory postings but
> they all seem closed in Version 2.4.1.
>
> Thanks,
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: OutOfMemoryError using IndexWriter
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Stefan,
While not directly th source of your problem, I have a feeling you are optimizing too frequently (and wasting time/CPU by doing so). Is there a reason you optimize so often? Try optimizing only at the end, when you know you won't be adding any more documents to the index for a while.
I'm also wondering why you close/open the IndeWriter, as it looks like you are doing batch indexing.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
> From: stefan <st...@intermediate.de>
> To: java-user@lucene.apache.org
> Sent: Wednesday, June 24, 2009 4:08:43 AM
> Subject: OutOfMemoryError using IndexWriter
>
> Hi,
>
> I am using Lucene 2.4.1 to index a database with less than a million records.
> The resulting index is about 50MB in size.
> I keep getting an OutOfMemory Error if I re-use the same IndexWriter to index
> the complete database. This is though
> recommended in the performance hints.
> What I now do is, every 10000 Objects I close the index (and every 50 close
> actions optimize it) and create a new
> IndexWriter to continue. This process works fine, but to me seems hardly the
> recommended way to go.
> I've been using jhat/jmap as well as Netbeans profiler and am fairly sure that
> this is a problem related to Lucene.
>
> Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory postings but
> they all seem closed in Version 2.4.1.
>
> Thanks,
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org