You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by stefan <st...@intermediate.de> on 2009/06/24 10:08:43 UTC

OutOfMemoryError using IndexWriter

Hi,

I am using Lucene 2.4.1 to index a database with less than a million records. The resulting index is about 50MB in size.
I keep getting an OutOfMemory Error if I re-use the same IndexWriter to index the complete database. This is though 
recommended in the performance hints.
What I now do is, every 10000 Objects I close the index (and every 50 close actions optimize it) and create a new
IndexWriter to continue. This process works fine, but to me seems hardly the recommended way to go.
I've been using jhat/jmap as well as Netbeans profiler and am fairly sure that this is a problem related to Lucene.

Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory postings but they all seem closed in Version 2.4.1.

Thanks,

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: OutOfMemoryError using IndexWriter

Posted by "Sudarsan, Sithu D." <Si...@fda.hhs.gov>.
 
Hi Stefan,

Generally, I run the memory monitor and see how much of it is being used. The total memory usage by application, is what we see on the monitor. So long as it is less that 1.8GB things are fine. 

In your case, you are assuming that the program takes only 50MB, which may be actually very different from the actual usage. In my observations, memory usage by the application rarely caused issues. The only thing that affected us was the Lucene index file(s). Our physical memory is 4GB (even though JVM sees only about 1.8GB).

Sincerely,
Sithu 
-----Original Message-----
From: stefan [mailto:stefan@intermediate.de] 
Sent: Wednesday, June 24, 2009 10:23 AM
To: java-user@lucene.apache.org
Subject: AW: OutOfMemoryError using IndexWriter


Hi,

does Lucene keep the complete index in memory ?
As stated before the result index is 50MB, this would correlate with the memory footprint required by Lucene as seen in my app:
jvm 120MB - 50MB(Lucene) - 50MB(my App) = something left
jvm 100MB - 50MB(Lucene) - 50MB(my App) = OOError

though some hint, whether this is the case, from the programming side would be appreciated ...

Stefan


-----Ursprüngliche Nachricht-----
Von: Sudarsan, Sithu D. [mailto:Sithu.Sudarsan@fda.hhs.gov]
Gesendet: Mi 24.06.2009 16:18
An: java-user@lucene.apache.org
Betreff: RE: OutOfMemoryError using IndexWriter
 
When the segments are merged, but not optimized. It happened at 1.8GB to our program, and now we develop and test in Win32 but run the code on Linux, which seems to be handling atleast upto 3GB of index. 

In fact, if the index size if beyond 1.8GB even, Luke throws Java Heap Error, if I try to open.

Please post your results/views.


Sincerely,
Sithu 
-----Original Message-----
From: stefan [mailto:stefan@intermediate.de] 
Sent: Wednesday, June 24, 2009 10:08 AM
To: java-user@lucene.apache.org
Subject: AW: OutOfMemoryError using IndexWriter


Hi,

I do use Win32.

What do you mean by "the index file before
optimizations crosses your jvm memory usage settings (if say 512MB)" ?

Could you please further explain this ?

Stefan

-----Ursprüngliche Nachricht-----
Von: Sudarsan, Sithu D. [mailto:Sithu.Sudarsan@fda.hhs.gov]
Gesendet: Mi 24.06.2009 15:55
An: java-user@lucene.apache.org
Betreff: RE: OutOfMemoryError using IndexWriter
 

 Hi Stefan,

Are you using Windows 32 bit? If so, sometimes, if the index file before
optimizations crosses your jvm memory usage settings (if say 512MB),
there is a possibility of this happening. 

Increase JVM memory settings if that is the case. 


Sincerely,
Sithu D Sudarsan

Off: 301-796-2587

sithu.sudarsan@fda.hhs.gov
sdsudarsan@ualr.edu

-----Original Message-----
From: stefan [mailto:stefan@intermediate.de] 
Sent: Wednesday, June 24, 2009 4:09 AM
To: java-user@lucene.apache.org
Subject: OutOfMemoryError using IndexWriter

Hi,

I am using Lucene 2.4.1 to index a database with less than a million
records. The resulting index is about 50MB in size.
I keep getting an OutOfMemory Error if I re-use the same IndexWriter to
index the complete database. This is though 
recommended in the performance hints.
What I now do is, every 10000 Objects I close the index (and every 50
close actions optimize it) and create a new
IndexWriter to continue. This process works fine, but to me seems hardly
the recommended way to go.
I've been using jhat/jmap as well as Netbeans profiler and am fairly
sure that this is a problem related to Lucene.

Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory
postings but they all seem closed in Version 2.4.1.

Thanks,

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: OutOfMemoryError using IndexWriter

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Wed, Jun 24, 2009 at 10:23 AM, stefan<st...@intermediate.de> wrote:

> does Lucene keep the complete index in memory ?

No.

Certain things (deleted docs, norms, field cache, terms index) are
loaded into memory, but these are tiny compared to what's not loaded
into memory (postings, stored docs, term vectors).

> As stated before the result index is 50MB, this would correlate with the memory footprint required by Lucene as seen in my app:
> jvm 120MB - 50MB(Lucene) - 50MB(my App) = something left
> jvm 100MB - 50MB(Lucene) - 50MB(my App) = OOError

I think this is just coincidence.

MIke

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: OutOfMemoryError using IndexWriter

Posted by stefan <st...@intermediate.de>.
Hi,

does Lucene keep the complete index in memory ?
As stated before the result index is 50MB, this would correlate with the memory footprint required by Lucene as seen in my app:
jvm 120MB - 50MB(Lucene) - 50MB(my App) = something left
jvm 100MB - 50MB(Lucene) - 50MB(my App) = OOError

though some hint, whether this is the case, from the programming side would be appreciated ...

Stefan


-----Urspr�ngliche Nachricht-----
Von: Sudarsan, Sithu D. [mailto:Sithu.Sudarsan@fda.hhs.gov]
Gesendet: Mi 24.06.2009 16:18
An: java-user@lucene.apache.org
Betreff: RE: OutOfMemoryError using IndexWriter
 
When the segments are merged, but not optimized. It happened at 1.8GB to our program, and now we develop and test in Win32 but run the code on Linux, which seems to be handling atleast upto 3GB of index. 

In fact, if the index size if beyond 1.8GB even, Luke throws Java Heap Error, if I try to open.

Please post your results/views.


Sincerely,
Sithu 
-----Original Message-----
From: stefan [mailto:stefan@intermediate.de] 
Sent: Wednesday, June 24, 2009 10:08 AM
To: java-user@lucene.apache.org
Subject: AW: OutOfMemoryError using IndexWriter


Hi,

I do use Win32.

What do you mean by "the index file before
optimizations crosses your jvm memory usage settings (if say 512MB)" ?

Could you please further explain this ?

Stefan

-----Urspr�ngliche Nachricht-----
Von: Sudarsan, Sithu D. [mailto:Sithu.Sudarsan@fda.hhs.gov]
Gesendet: Mi 24.06.2009 15:55
An: java-user@lucene.apache.org
Betreff: RE: OutOfMemoryError using IndexWriter
 

 Hi Stefan,

Are you using Windows 32 bit? If so, sometimes, if the index file before
optimizations crosses your jvm memory usage settings (if say 512MB),
there is a possibility of this happening. 

Increase JVM memory settings if that is the case. 


Sincerely,
Sithu D Sudarsan

Off: 301-796-2587

sithu.sudarsan@fda.hhs.gov
sdsudarsan@ualr.edu

-----Original Message-----
From: stefan [mailto:stefan@intermediate.de] 
Sent: Wednesday, June 24, 2009 4:09 AM
To: java-user@lucene.apache.org
Subject: OutOfMemoryError using IndexWriter

Hi,

I am using Lucene 2.4.1 to index a database with less than a million
records. The resulting index is about 50MB in size.
I keep getting an OutOfMemory Error if I re-use the same IndexWriter to
index the complete database. This is though 
recommended in the performance hints.
What I now do is, every 10000 Objects I close the index (and every 50
close actions optimize it) and create a new
IndexWriter to continue. This process works fine, but to me seems hardly
the recommended way to go.
I've been using jhat/jmap as well as Netbeans profiler and am fairly
sure that this is a problem related to Lucene.

Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory
postings but they all seem closed in Version 2.4.1.

Thanks,

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




RE: OutOfMemoryError using IndexWriter

Posted by "Sudarsan, Sithu D." <Si...@fda.hhs.gov>.
When the segments are merged, but not optimized. It happened at 1.8GB to our program, and now we develop and test in Win32 but run the code on Linux, which seems to be handling atleast upto 3GB of index. 

In fact, if the index size if beyond 1.8GB even, Luke throws Java Heap Error, if I try to open.

Please post your results/views.


Sincerely,
Sithu 
-----Original Message-----
From: stefan [mailto:stefan@intermediate.de] 
Sent: Wednesday, June 24, 2009 10:08 AM
To: java-user@lucene.apache.org
Subject: AW: OutOfMemoryError using IndexWriter


Hi,

I do use Win32.

What do you mean by "the index file before
optimizations crosses your jvm memory usage settings (if say 512MB)" ?

Could you please further explain this ?

Stefan

-----Ursprüngliche Nachricht-----
Von: Sudarsan, Sithu D. [mailto:Sithu.Sudarsan@fda.hhs.gov]
Gesendet: Mi 24.06.2009 15:55
An: java-user@lucene.apache.org
Betreff: RE: OutOfMemoryError using IndexWriter
 

 Hi Stefan,

Are you using Windows 32 bit? If so, sometimes, if the index file before
optimizations crosses your jvm memory usage settings (if say 512MB),
there is a possibility of this happening. 

Increase JVM memory settings if that is the case. 


Sincerely,
Sithu D Sudarsan

Off: 301-796-2587

sithu.sudarsan@fda.hhs.gov
sdsudarsan@ualr.edu

-----Original Message-----
From: stefan [mailto:stefan@intermediate.de] 
Sent: Wednesday, June 24, 2009 4:09 AM
To: java-user@lucene.apache.org
Subject: OutOfMemoryError using IndexWriter

Hi,

I am using Lucene 2.4.1 to index a database with less than a million
records. The resulting index is about 50MB in size.
I keep getting an OutOfMemory Error if I re-use the same IndexWriter to
index the complete database. This is though 
recommended in the performance hints.
What I now do is, every 10000 Objects I close the index (and every 50
close actions optimize it) and create a new
IndexWriter to continue. This process works fine, but to me seems hardly
the recommended way to go.
I've been using jhat/jmap as well as Netbeans profiler and am fairly
sure that this is a problem related to Lucene.

Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory
postings but they all seem closed in Version 2.4.1.

Thanks,

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: OutOfMemoryError using IndexWriter

Posted by stefan <st...@intermediate.de>.
Hi,

I do use Win32.

What do you mean by "the index file before
optimizations crosses your jvm memory usage settings (if say 512MB)" ?

Could you please further explain this ?

Stefan

-----Urspr�ngliche Nachricht-----
Von: Sudarsan, Sithu D. [mailto:Sithu.Sudarsan@fda.hhs.gov]
Gesendet: Mi 24.06.2009 15:55
An: java-user@lucene.apache.org
Betreff: RE: OutOfMemoryError using IndexWriter
 

 Hi Stefan,

Are you using Windows 32 bit? If so, sometimes, if the index file before
optimizations crosses your jvm memory usage settings (if say 512MB),
there is a possibility of this happening. 

Increase JVM memory settings if that is the case. 


Sincerely,
Sithu D Sudarsan

Off: 301-796-2587

sithu.sudarsan@fda.hhs.gov
sdsudarsan@ualr.edu

-----Original Message-----
From: stefan [mailto:stefan@intermediate.de] 
Sent: Wednesday, June 24, 2009 4:09 AM
To: java-user@lucene.apache.org
Subject: OutOfMemoryError using IndexWriter

Hi,

I am using Lucene 2.4.1 to index a database with less than a million
records. The resulting index is about 50MB in size.
I keep getting an OutOfMemory Error if I re-use the same IndexWriter to
index the complete database. This is though 
recommended in the performance hints.
What I now do is, every 10000 Objects I close the index (and every 50
close actions optimize it) and create a new
IndexWriter to continue. This process works fine, but to me seems hardly
the recommended way to go.
I've been using jhat/jmap as well as Netbeans profiler and am fairly
sure that this is a problem related to Lucene.

Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory
postings but they all seem closed in Version 2.4.1.

Thanks,

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




RE: OutOfMemoryError using IndexWriter

Posted by "Sudarsan, Sithu D." <Si...@fda.hhs.gov>.
 Hi Stefan,

Are you using Windows 32 bit? If so, sometimes, if the index file before
optimizations crosses your jvm memory usage settings (if say 512MB),
there is a possibility of this happening. 

Increase JVM memory settings if that is the case. 


Sincerely,
Sithu D Sudarsan

Off: 301-796-2587

sithu.sudarsan@fda.hhs.gov
sdsudarsan@ualr.edu

-----Original Message-----
From: stefan [mailto:stefan@intermediate.de] 
Sent: Wednesday, June 24, 2009 4:09 AM
To: java-user@lucene.apache.org
Subject: OutOfMemoryError using IndexWriter

Hi,

I am using Lucene 2.4.1 to index a database with less than a million
records. The resulting index is about 50MB in size.
I keep getting an OutOfMemory Error if I re-use the same IndexWriter to
index the complete database. This is though 
recommended in the performance hints.
What I now do is, every 10000 Objects I close the index (and every 50
close actions optimize it) and create a new
IndexWriter to continue. This process works fine, but to me seems hardly
the recommended way to go.
I've been using jhat/jmap as well as Netbeans profiler and am fairly
sure that this is a problem related to Lucene.

Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory
postings but they all seem closed in Version 2.4.1.

Thanks,

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: OutOfMemoryError using IndexWriter

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK it looks like no merging was done.

I think the next step is to call
IndexWriter.setMaxBufferedDeleteTerms(1000) and see if that prevents
the OOM.

Mike

On Thu, Jun 25, 2009 at 7:16 AM, stefan<st...@intermediate.de> wrote:
> Hi,
>
> Here are the result of CheckIndex. I ran this just after I got the OOError.
>
> OK [4 fields]
>    test: terms, freq, prox...OK [509534 terms; 9126904 terms/docs pairs; 4933036 tokens]
>    test: stored fields.......OK [148124 total field count; avg 2 fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
>
>  2 of 7: name=_b docCount=17724
>    compound=true
>    hasProx=true
>    numFiles=2
>    size (MB)=4,514
>    docStoreOffset=0
>    docStoreSegment=_b
>    docStoreIsCompoundFile=true
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [4 fields]
>    test: terms, freq, prox...OK [122054 terms; 1022477 terms/docs pairs; 1560703 tokens]
>    test: stored fields.......OK [35448 total field count; avg 2 fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
>
>  3 of 7: name=_c docCount=15952
>    compound=true
>    hasProx=true
>    numFiles=2
>    size (MB)=4,539
>    docStoreOffset=17724
>    docStoreSegment=_b
>    docStoreIsCompoundFile=true
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [4 fields]
>    test: terms, freq, prox...OK [125512 terms; 1047363 terms/docs pairs; 1535701 tokens]
>    test: stored fields.......OK [31904 total field count; avg 2 fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
>
>  4 of 7: name=_d docCount=19975
>    compound=true
>    hasProx=true
>    numFiles=2
>    size (MB)=5,547
>    docStoreOffset=33676
>    docStoreSegment=_b
>    docStoreIsCompoundFile=true
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [4 fields]
>    test: terms, freq, prox...OK [101563 terms; 1327972 terms/docs pairs; 2390213 tokens]
>    test: stored fields.......OK [39950 total field count; avg 2 fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
>
>  5 of 7: name=_e docCount=24740
>    compound=true
>    hasProx=true
>    numFiles=2
>    size (MB)=5,458
>    docStoreOffset=53651
>    docStoreSegment=_b
>    docStoreIsCompoundFile=true
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [4 fields]
>    test: terms, freq, prox...OK [94791 terms; 1290085 terms/docs pairs; 2501794 tokens]
>    test: stored fields.......OK [49480 total field count; avg 2 fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
>
>  6 of 7: name=_f docCount=21584
>    compound=true
>    hasProx=true
>    numFiles=2
>    size (MB)=5,914
>    docStoreOffset=0
>    docStoreSegment=_f
>    docStoreIsCompoundFile=true
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [4 fields]
>    test: terms, freq, prox...OK [92162 terms; 1267882 terms/docs pairs; 2570682 tokens]
>    test: stored fields.......OK [43168 total field count; avg 2 fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
>
>  7 of 7: name=_g docCount=13600
>    compound=true
>    hasProx=true
>    numFiles=2
>    size (MB)=1,664
>    docStoreOffset=21584
>    docStoreSegment=_f
>    docStoreIsCompoundFile=true
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [4 fields]
>    test: terms, freq, prox...OK [42087 terms; 326152 terms/docs pairs; 667302 tokens]
>    test: stored fields.......OK [27200 total field count; avg 2 fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
>
> No problems were detected with this index.
>
>
> Stefan
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: OutOfMemoryError using IndexWriter

Posted by stefan <st...@intermediate.de>.
Hi,

Here are the result of CheckIndex. I ran this just after I got the OOError.

OK [4 fields]
    test: terms, freq, prox...OK [509534 terms; 9126904 terms/docs pairs; 4933036 tokens]
    test: stored fields.......OK [148124 total field count; avg 2 fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

  2 of 7: name=_b docCount=17724
    compound=true
    hasProx=true
    numFiles=2
    size (MB)=4,514
    docStoreOffset=0
    docStoreSegment=_b
    docStoreIsCompoundFile=true
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [4 fields]
    test: terms, freq, prox...OK [122054 terms; 1022477 terms/docs pairs; 1560703 tokens]
    test: stored fields.......OK [35448 total field count; avg 2 fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

  3 of 7: name=_c docCount=15952
    compound=true
    hasProx=true
    numFiles=2
    size (MB)=4,539
    docStoreOffset=17724
    docStoreSegment=_b
    docStoreIsCompoundFile=true
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [4 fields]
    test: terms, freq, prox...OK [125512 terms; 1047363 terms/docs pairs; 1535701 tokens]
    test: stored fields.......OK [31904 total field count; avg 2 fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

  4 of 7: name=_d docCount=19975
    compound=true
    hasProx=true
    numFiles=2
    size (MB)=5,547
    docStoreOffset=33676
    docStoreSegment=_b
    docStoreIsCompoundFile=true
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [4 fields]
    test: terms, freq, prox...OK [101563 terms; 1327972 terms/docs pairs; 2390213 tokens]
    test: stored fields.......OK [39950 total field count; avg 2 fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

  5 of 7: name=_e docCount=24740
    compound=true
    hasProx=true
    numFiles=2
    size (MB)=5,458
    docStoreOffset=53651
    docStoreSegment=_b
    docStoreIsCompoundFile=true
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [4 fields]
    test: terms, freq, prox...OK [94791 terms; 1290085 terms/docs pairs; 2501794 tokens]
    test: stored fields.......OK [49480 total field count; avg 2 fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

  6 of 7: name=_f docCount=21584
    compound=true
    hasProx=true
    numFiles=2
    size (MB)=5,914
    docStoreOffset=0
    docStoreSegment=_f
    docStoreIsCompoundFile=true
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [4 fields]
    test: terms, freq, prox...OK [92162 terms; 1267882 terms/docs pairs; 2570682 tokens]
    test: stored fields.......OK [43168 total field count; avg 2 fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

  7 of 7: name=_g docCount=13600
    compound=true
    hasProx=true
    numFiles=2
    size (MB)=1,664
    docStoreOffset=21584
    docStoreSegment=_f
    docStoreIsCompoundFile=true
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [4 fields]
    test: terms, freq, prox...OK [42087 terms; 326152 terms/docs pairs; 667302 tokens]
    test: stored fields.......OK [27200 total field count; avg 2 fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

No problems were detected with this index.


Stefan


Re: OutOfMemoryError using IndexWriter

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Thu, Jun 25, 2009 at 3:02 AM, stefan<st...@intermediate.de> wrote:

>>But a "leak" would keep leaking over time, right?  Ie even a 1 GB heap
>>on your test db should eventually throw OOME if there's really a leak.
> No not necessarily, since I stop indexing ones everything is indexed - I shall try repeated runs with 120MB.

It indeed looks like IndexWriter doens't account for RAM used by
pending deletes.  I've opened
https://issues.apache.org/jira/browse/LUCENE-1717 for this.  Though
I'd normally expect the amount of extra RAM used to be smallish...

Do you have a high merge factor?  Can you run CheckIndex on your index
(java org.apache.lucene.index.CheckIndex /path/to/index) and post the
output back?

Currently IndexWriter will flush these deletes on kicking off a merge,
or if commit() is called, so one workaround you could try is to call
commit() every so often and see if that improves the RAM usage?

>>Are you calling updateDocument (which deletes then adds)?
> Yes I do, I do not know in my code whether the document is already indexed or not. In my test case I do delete the
> complete index before the run, so all documents should be new to the index. I still use update though, since I
> this piece of code is generic.

OK that's a good reason to use updateDocument.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: OutOfMemoryError using IndexWriter

Posted by stefan <st...@intermediate.de>.
Hi,

>But a "leak" would keep leaking over time, right?  Ie even a 1 GB heap
>on your test db should eventually throw OOME if there's really a leak.
No not necessarily, since I stop indexing ones everything is indexed - I shall try repeated runs with 120MB.

>Are you calling updateDocument (which deletes then adds)?
Yes I do, I do not know in my code whether the document is already indexed or not. In my test case I do delete the
complete index before the run, so all documents should be new to the index. I still use update though, since I
this piece of code is generic.

I shall post my results.

Stefan



Re: OutOfMemoryError using IndexWriter

Posted by Simon Willnauer <si...@googlemail.com>.
On Thu, Jun 25, 2009 at 1:13 PM, Michael
McCandless<lu...@mikemccandless.com> wrote:
> Can you post your test code?  If you can make it a standalone test,
> then I can repro and dig down faster.
>
> Can you try calling IndexWriter.setMaxBufferedDeleteTerms (eg, 1000)
> and see if that prevents the OOM?
>
> Mike
>
> On Thu, Jun 25, 2009 at 7:10 AM, stefan<st...@intermediate.de> wrote:
>>
>> Hi Mike,
>>
>> I just changed my test-code to run in an indefinite loop over the database to index everything. Set the jvm to 120MB heap size, all other parameters as before.
>> I got an OOError just as before - so I would say there is a leak somewhere.
>>
>> Here is the histogram.
>>
>> Heap Histogram
>>
>> All Classes (excluding platform)
>> Class   Instance Count  Total Size
>> class [B        1809102         41992326
>> class [C        200610  26877068
>> class [[B       46117   9473872
>> class java.lang.String  198629  3178064
>> class org.apache.lucene.index.FreqProxTermsWriter$PostingList   100927  2825956
>> class [Ljava.util.HashMap$Entry;        11329   2494312
>> class java.util.HashMap$Entry   132578  2121248
>> class [I        5186    2097300
I would be interested what happens to DocumentsWriter#deletesInRam can
you check if this keeps on growing even after flushes?!

simon
>>
>> So far I had no success in pinpointing those binary arrays, I will need some more time for this.
>>
>> Stefan
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Gesendet: Mi 24.06.2009 17:50
>> An: java-user@lucene.apache.org
>> Betreff: Re: OutOfMemoryError using IndexWriter
>>
>> On Wed, Jun 24, 2009 at 10:18 AM, stefan<st...@intermediate.de> wrote:
>>>
>>> Hi,
>>>
>>>
>>>>OK so this means it's not a leak, and instead it's just that stuff is
>>>>consuming more RAM than expected.
>>> Or that my test db is smaller than the production db which is indeed the case.
>>
>> But a "leak" would keep leaking over time, right?  Ie even a 1 GB heap
>> on your test db should eventually throw OOME if there's really a leak.
>>
>>> Please explain those buffered deletes in a few more details.
>>
>> Are you calling updateDocument (which deletes then adds)?
>>
>> Deletes (the Term or Query you pass to updateDocument or
>> deleteDocuments) are buffered in a HashMap and then that buffer is
>> materialized into actual deleted doc IDs when IndexWriter decides to
>> do so.  I think IndexWriter isn't properly flushing the deletes when
>> they use too much RAM.
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: OutOfMemoryError using IndexWriter

Posted by stefan <st...@intermediate.de>.
Hi Mike,

I ran a burn in test overnight with repeatedly indexing the same db in a loop. I set the heap size to 120MB and called setMaxBufferedDeleteTerms( 1000), I did not call commit and used the same index writer.
This test passed  without any errors.

So to wrap this up - I shall call commit every 10000 documents added, the performance impact on calling setMaxBufferedDeleteTerms is quite significant (I guess somewhere between factor 2-10). I'll check on the speed of commit.
Commit seems to be able to free more memory (since it runs with 100MB heap size).
I'll have to give up on tracking down those binary references, I spent too much time on this already.

Thanks a lot for your insights.

Stefan




-----Urspr�ngliche Nachricht-----
Von: Michael McCandless [mailto:lucene@mikemccandless.com]
Gesendet: Do 25.06.2009 15:57
An: java-user@lucene.apache.org
Betreff: Re: OutOfMemoryError using IndexWriter
 
Interesting that excessive deletes buffering is not your problem...

Even if you can't post the resulting test case, if you can simplify it
& run locally, to rule out anything outside Lucene that's allocating
the byte/char/byte[] arrays, that can help to isolate.

Also, profilers can trace where allocations happened, eg YourKit.

Mike

On Thu, Jun 25, 2009 at 9:08 AM, stefan<st...@intermediate.de> wrote:
> Hi,
>
> I'm afraid my test setup and code this is far too big.
> What I use lucene for is fairly simple. I have a database with about 150 tables, I iterate all tables and create for each row a String representation similar to a toString method containing all database data. This string is then fed together with the primary key to lucene. Full-text search of my db is then possible. Each document in Lucene represents a row in the database.
>
> I tried calling setMaxBufferedDeleteTerms �with 100MB heap size to no avail, but calling commit every 10000 documents does help. I assume a commit is similar to creating a new IndexWriter.
>
> HTH,
>
> Stefan
>
>
>
>
> -----Urspr�ngliche Nachricht-----
> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
> Gesendet: Do 25.06.2009 13:13
> An: java-user@lucene.apache.org
> Betreff: Re: OutOfMemoryError using IndexWriter
>
> Can you post your test code? �If you can make it a standalone test,
> then I can repro and dig down faster.
>
> Can you try calling IndexWriter.setMaxBufferedDeleteTerms (eg, 1000)
> and see if that prevents the OOM?
>
> Mike
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




Re: OutOfMemoryError using IndexWriter

Posted by Michael McCandless <lu...@mikemccandless.com>.
Interesting that excessive deletes buffering is not your problem...

Even if you can't post the resulting test case, if you can simplify it
& run locally, to rule out anything outside Lucene that's allocating
the byte/char/byte[] arrays, that can help to isolate.

Also, profilers can trace where allocations happened, eg YourKit.

Mike

On Thu, Jun 25, 2009 at 9:08 AM, stefan<st...@intermediate.de> wrote:
> Hi,
>
> I'm afraid my test setup and code this is far too big.
> What I use lucene for is fairly simple. I have a database with about 150 tables, I iterate all tables and create for each row a String representation similar to a toString method containing all database data. This string is then fed together with the primary key to lucene. Full-text search of my db is then possible. Each document in Lucene represents a row in the database.
>
> I tried calling setMaxBufferedDeleteTerms  with 100MB heap size to no avail, but calling commit every 10000 documents does help. I assume a commit is similar to creating a new IndexWriter.
>
> HTH,
>
> Stefan
>
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
> Gesendet: Do 25.06.2009 13:13
> An: java-user@lucene.apache.org
> Betreff: Re: OutOfMemoryError using IndexWriter
>
> Can you post your test code?  If you can make it a standalone test,
> then I can repro and dig down faster.
>
> Can you try calling IndexWriter.setMaxBufferedDeleteTerms (eg, 1000)
> and see if that prevents the OOM?
>
> Mike
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: OutOfMemoryError using IndexWriter

Posted by stefan <st...@intermediate.de>.
Hi,

I'm afraid my test setup and code this is far too big.
What I use lucene for is fairly simple. I have a database with about 150 tables, I iterate all tables and create for each row a String representation similar to a toString method containing all database data. This string is then fed together with the primary key to lucene. Full-text search of my db is then possible. Each document in Lucene represents a row in the database.

I tried calling setMaxBufferedDeleteTerms  with 100MB heap size to no avail, but calling commit every 10000 documents does help. I assume a commit is similar to creating a new IndexWriter.

HTH,

Stefan




-----Urspr�ngliche Nachricht-----
Von: Michael McCandless [mailto:lucene@mikemccandless.com]
Gesendet: Do 25.06.2009 13:13
An: java-user@lucene.apache.org
Betreff: Re: OutOfMemoryError using IndexWriter
 
Can you post your test code?  If you can make it a standalone test,
then I can repro and dig down faster.

Can you try calling IndexWriter.setMaxBufferedDeleteTerms (eg, 1000)
and see if that prevents the OOM?

Mike



Re: OutOfMemoryError using IndexWriter

Posted by Michael McCandless <lu...@mikemccandless.com>.
Can you post your test code?  If you can make it a standalone test,
then I can repro and dig down faster.

Can you try calling IndexWriter.setMaxBufferedDeleteTerms (eg, 1000)
and see if that prevents the OOM?

Mike

On Thu, Jun 25, 2009 at 7:10 AM, stefan<st...@intermediate.de> wrote:
>
> Hi Mike,
>
> I just changed my test-code to run in an indefinite loop over the database to index everything. Set the jvm to 120MB heap size, all other parameters as before.
> I got an OOError just as before - so I would say there is a leak somewhere.
>
> Here is the histogram.
>
> Heap Histogram
>
> All Classes (excluding platform)
> Class   Instance Count  Total Size
> class [B        1809102         41992326
> class [C        200610  26877068
> class [[B       46117   9473872
> class java.lang.String  198629  3178064
> class org.apache.lucene.index.FreqProxTermsWriter$PostingList   100927  2825956
> class [Ljava.util.HashMap$Entry;        11329   2494312
> class java.util.HashMap$Entry   132578  2121248
> class [I        5186    2097300
>
> So far I had no success in pinpointing those binary arrays, I will need some more time for this.
>
> Stefan
>
> -----Ursprüngliche Nachricht-----
> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
> Gesendet: Mi 24.06.2009 17:50
> An: java-user@lucene.apache.org
> Betreff: Re: OutOfMemoryError using IndexWriter
>
> On Wed, Jun 24, 2009 at 10:18 AM, stefan<st...@intermediate.de> wrote:
>>
>> Hi,
>>
>>
>>>OK so this means it's not a leak, and instead it's just that stuff is
>>>consuming more RAM than expected.
>> Or that my test db is smaller than the production db which is indeed the case.
>
> But a "leak" would keep leaking over time, right?  Ie even a 1 GB heap
> on your test db should eventually throw OOME if there's really a leak.
>
>> Please explain those buffered deletes in a few more details.
>
> Are you calling updateDocument (which deletes then adds)?
>
> Deletes (the Term or Query you pass to updateDocument or
> deleteDocuments) are buffered in a HashMap and then that buffer is
> materialized into actual deleted doc IDs when IndexWriter decides to
> do so.  I think IndexWriter isn't properly flushing the deletes when
> they use too much RAM.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: OutOfMemoryError using IndexWriter

Posted by stefan <st...@intermediate.de>.
Hi Mike,

I just changed my test-code to run in an indefinite loop over the database to index everything. Set the jvm to 120MB heap size, all other parameters as before.
I got an OOError just as before - so I would say there is a leak somewhere.

Here is the histogram.

Heap Histogram

All Classes (excluding platform)
Class 	Instance Count 	Total Size
class [B 	1809102 	41992326
class [C 	200610 	26877068
class [[B 	46117 	9473872
class java.lang.String 	198629 	3178064
class org.apache.lucene.index.FreqProxTermsWriter$PostingList 	100927 	2825956
class [Ljava.util.HashMap$Entry; 	11329 	2494312
class java.util.HashMap$Entry 	132578 	2121248
class [I 	5186 	2097300 

So far I had no success in pinpointing those binary arrays, I will need some more time for this.

Stefan

-----Urspr�ngliche Nachricht-----
Von: Michael McCandless [mailto:lucene@mikemccandless.com]
Gesendet: Mi 24.06.2009 17:50
An: java-user@lucene.apache.org
Betreff: Re: OutOfMemoryError using IndexWriter
 
On Wed, Jun 24, 2009 at 10:18 AM, stefan<st...@intermediate.de> wrote:
>
> Hi,
>
>
>>OK so this means it's not a leak, and instead it's just that stuff is
>>consuming more RAM than expected.
> Or that my test db is smaller than the production db which is indeed the case.

But a "leak" would keep leaking over time, right?  Ie even a 1 GB heap
on your test db should eventually throw OOME if there's really a leak.

> Please explain those buffered deletes in a few more details.

Are you calling updateDocument (which deletes then adds)?

Deletes (the Term or Query you pass to updateDocument or
deleteDocuments) are buffered in a HashMap and then that buffer is
materialized into actual deleted doc IDs when IndexWriter decides to
do so.  I think IndexWriter isn't properly flushing the deletes when
they use too much RAM.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




Re: OutOfMemoryError using IndexWriter

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Wed, Jun 24, 2009 at 10:18 AM, stefan<st...@intermediate.de> wrote:
>
> Hi,
>
>
>>OK so this means it's not a leak, and instead it's just that stuff is
>>consuming more RAM than expected.
> Or that my test db is smaller than the production db which is indeed the case.

But a "leak" would keep leaking over time, right?  Ie even a 1 GB heap
on your test db should eventually throw OOME if there's really a leak.

> Please explain those buffered deletes in a few more details.

Are you calling updateDocument (which deletes then adds)?

Deletes (the Term or Query you pass to updateDocument or
deleteDocuments) are buffered in a HashMap and then that buffer is
materialized into actual deleted doc IDs when IndexWriter decides to
do so.  I think IndexWriter isn't properly flushing the deletes when
they use too much RAM.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: OutOfMemoryError using IndexWriter

Posted by stefan <st...@intermediate.de>.
Hi,


>OK so this means it's not a leak, and instead it's just that stuff is
>consuming more RAM than expected.
Or that my test db is smaller than the production db which is indeed the case.

>Hmm -- there are quite a few buffered deletes pending.  It could be we
>are under-accounting for RAM used by buffered deletes.  I'll dig on
>that.
That sounds promising to me, how does a delete happen, we are talking of complete new re-indexing, no deletes at all should be happening. Are you saying that I remove docs from the index ?


>Also, your char[]'s are taking ~30 MB, byte[] ~26MB, which is odd if
>your RAM buffer is 16MB.  Does your app create these?
A fair amount is created by my app - I histogram without indexing shows about 10MB chars created by my App:
class [C 	132399 	9457148,
though a few more could be created during indexing.

> Why is it, that creating a new Index Writer will let the indexing run fine with 80MB, but keeping it will create an
> OutOfMemoryException running with 100MB heap size ?

Please explain those buffered deletes in a few more details.

Thanks,

Stefan


Re: OutOfMemoryError using IndexWriter

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Wed, Jun 24, 2009 at 7:43 AM, stefan<st...@intermediate.de> wrote:

> I tried with 100MB heap size and got the Error as well, it runs fine with 120MB.

OK so this means it's not a leak, and instead it's just that stuff is
consuming more RAM than expected.

> Here is the histogram (application classes marked with --)
>
> Heap Histogram
>
> All Classes (excluding platform)
> Class   Instance Count  Total Size
> class [C        234200  30245722
> class [B        1087565         25999145
> class [[B       28430   4890060
> class java.lang.String  232351  3717616
> class org.apache.lucene.index.FreqProxTermsWriter$PostingList   99584   2788352
> class java.util.HashMap$Entry   171031  2736496
> class [Ljava.util.HashMap$Entry;        9563    2371256
> class [Ljava.lang.Object;       31820   1820224
> class ---       4474    1753808
> class [I        4337    1567796
> class java.lang.reflect.Method  19774   1364406
> class org.apache.lucene.index.Term      117982  943856
> class [Lorg.apache.lucene.index.RawPostingList;         12      770012
> class ---       1837    490479
> class org.apache.lucene.index.BufferedDeletes$Num       117303  469212
>
> The --- as well was the reflect.Method are part of the app's data.

Hmm -- there are quite a few buffered deletes pending.  It could be we
are under-accounting for RAM used by buffered deletes.  I'll dig on
that.

Also, your char[]'s are taking ~30 MB, byte[] ~26MB, which is odd if
your RAM buffer is 16MB.  Does your app create these?

> Why is it, that creating a new Index Writer will let the indexing run fine with 80MB, but keeping it will create an
> OutOfMemoryException running with 100MB heap size ?

Could be because you are buffering so many deletes (if indeed Lucene
doesn't account for that RAM consumption properly)...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: OutOfMemoryError using IndexWriter

Posted by stefan <st...@intermediate.de>.
Hi,

I tried with 100MB heap size and got the Error as well, it runs fine with 120MB.

Here is the histogram (application classes marked with --)

Heap Histogram

All Classes (excluding platform)
Class 	Instance Count 	Total Size
class [C 	234200 	30245722
class [B 	1087565 	25999145
class [[B 	28430 	4890060
class java.lang.String 	232351 	3717616
class org.apache.lucene.index.FreqProxTermsWriter$PostingList 	99584 	2788352
class java.util.HashMap$Entry 	171031 	2736496
class [Ljava.util.HashMap$Entry; 	9563 	2371256
class [Ljava.lang.Object; 	31820 	1820224
class --- 	4474 	1753808
class [I 	4337 	1567796
class java.lang.reflect.Method 	19774 	1364406
class org.apache.lucene.index.Term 	117982 	943856
class [Lorg.apache.lucene.index.RawPostingList; 	12 	770012
class --- 	1837 	490479
class org.apache.lucene.index.BufferedDeletes$Num 	117303 	469212 

The --- as well was the reflect.Method are part of the app's data.

Why is it, that creating a new Index Writer will let the indexing run fine with 80MB, but keeping it will create an
OutOfMemoryException running with 100MB heap size ?

Stefan


-----Urspr�ngliche Nachricht-----
Von: Michael McCandless [mailto:lucene@mikemccandless.com]
Gesendet: Mi 24.06.2009 11:52
An: java-user@lucene.apache.org
Betreff: Re: OutOfMemoryError using IndexWriter
 
Hmm -- I think your test env (80 MB heap, 50 MB used by app + 16 MB
IndexWriter RAM buffer) is a bit too tight.  The 16 MB buffer for IW
is not a hard upper bound on how much RAM it may use.  EG when merges
are running, more RAM will be required, if a large doc brought it over
the 16 MB limit it will consume more, etc.

~3 MB used by PostingList is reasonable.

If after fixing the problem in your code, with a larger heap size
you're still running out of RAM, then please post the full histogram
from the resulting heap dump at which point the offender will be
obvious.

Or, can you make the problem happen with a smallish test case?

Mike

On Wed, Jun 24, 2009 at 5:37 AM, stefan<st...@intermediate.de> wrote:
> Hi,
>
> I do not set a RAM Buffer size, I assume default is 16MB.
> My server runs with 80MB heap size, before starting lucene about 50MB is used. In a production environment I run in this problem with heap size set to 750MB with no other activity on the server (nighttime), though since then I diagnosed some problem with my code as well. I just reproduced it with 80MB but I guess I can reproduce it with 100MB heap as well, just takes longer.
>
> Here is the stack, I keep the dump for
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to c:\auto_heap_intern.prof ...
> Heap dump file created [97173841 bytes in 3.534 secs]
> ERROR lucene.SearchManager � � � - Failure in index daemon:
> java.lang.OutOfMemoryError: Java heap space
> � � � �at java.util.HashSet.<init>(HashSet.java:86)
> � � � �at org.apache.lucene.index.DocumentsWriter.initFlushState(DocumentsWriter.java:540)
> � � � �at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
> � � � �at org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1703)
> � � � �at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3534)
> � � � �at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
> � � � �at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
> � � � �at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
> � � � �at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
>
> Heap Histogram shows:
> class org.apache.lucene.index.FreqProxTermsWriter$PostingList � 116736 (instances) � � �3268608 (size)
>
> Well, something I should do differently ?
>
> Stefan
>
> -----Urspr�ngliche Nachricht-----
> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
> Gesendet: Mi 24.06.2009 10:48
> An: java-user@lucene.apache.org
> Betreff: Re: OutOfMemoryError using IndexWriter
>
> How large is the RAM buffer that you're giving IndexWriter? �How large
> a heap size do you give to JVM?
>
> Can you post one of the OOM exceptions you're hitting?
>
> Mike
>
> On Wed, Jun 24, 2009 at 4:08 AM, stefan<st...@intermediate.de> wrote:
>> Hi,
>>
>> I am using Lucene 2.4.1 to index a database with less than a million records. The resulting index is about 50MB in size.
>> I keep getting an OutOfMemory Error if I re-use the same IndexWriter to index the complete database. This is though
>> recommended in the performance hints.
>> What I now do is, every 10000 Objects I close the index (and every 50 close actions optimize it) and create a new
>> IndexWriter to continue. This process works fine, but to me seems hardly the recommended way to go.
>> I've been using jhat/jmap as well as Netbeans profiler and am fairly sure that this is a problem related to Lucene.
>>
>> Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory postings but they all seem closed in Version 2.4.1.
>>
>> Thanks,
>>
>> Stefan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




Re: OutOfMemoryError using IndexWriter

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hmm -- I think your test env (80 MB heap, 50 MB used by app + 16 MB
IndexWriter RAM buffer) is a bit too tight.  The 16 MB buffer for IW
is not a hard upper bound on how much RAM it may use.  EG when merges
are running, more RAM will be required, if a large doc brought it over
the 16 MB limit it will consume more, etc.

~3 MB used by PostingList is reasonable.

If after fixing the problem in your code, with a larger heap size
you're still running out of RAM, then please post the full histogram
from the resulting heap dump at which point the offender will be
obvious.

Or, can you make the problem happen with a smallish test case?

Mike

On Wed, Jun 24, 2009 at 5:37 AM, stefan<st...@intermediate.de> wrote:
> Hi,
>
> I do not set a RAM Buffer size, I assume default is 16MB.
> My server runs with 80MB heap size, before starting lucene about 50MB is used. In a production environment I run in this problem with heap size set to 750MB with no other activity on the server (nighttime), though since then I diagnosed some problem with my code as well. I just reproduced it with 80MB but I guess I can reproduce it with 100MB heap as well, just takes longer.
>
> Here is the stack, I keep the dump for
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to c:\auto_heap_intern.prof ...
> Heap dump file created [97173841 bytes in 3.534 secs]
> ERROR lucene.SearchManager       - Failure in index daemon:
> java.lang.OutOfMemoryError: Java heap space
>        at java.util.HashSet.<init>(HashSet.java:86)
>        at org.apache.lucene.index.DocumentsWriter.initFlushState(DocumentsWriter.java:540)
>        at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
>        at org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1703)
>        at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3534)
>        at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
>        at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
>        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
>        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
>
> Heap Histogram shows:
> class org.apache.lucene.index.FreqProxTermsWriter$PostingList   116736 (instances)      3268608 (size)
>
> Well, something I should do differently ?
>
> Stefan
>
> -----Ursprüngliche Nachricht-----
> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
> Gesendet: Mi 24.06.2009 10:48
> An: java-user@lucene.apache.org
> Betreff: Re: OutOfMemoryError using IndexWriter
>
> How large is the RAM buffer that you're giving IndexWriter?  How large
> a heap size do you give to JVM?
>
> Can you post one of the OOM exceptions you're hitting?
>
> Mike
>
> On Wed, Jun 24, 2009 at 4:08 AM, stefan<st...@intermediate.de> wrote:
>> Hi,
>>
>> I am using Lucene 2.4.1 to index a database with less than a million records. The resulting index is about 50MB in size.
>> I keep getting an OutOfMemory Error if I re-use the same IndexWriter to index the complete database. This is though
>> recommended in the performance hints.
>> What I now do is, every 10000 Objects I close the index (and every 50 close actions optimize it) and create a new
>> IndexWriter to continue. This process works fine, but to me seems hardly the recommended way to go.
>> I've been using jhat/jmap as well as Netbeans profiler and am fairly sure that this is a problem related to Lucene.
>>
>> Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory postings but they all seem closed in Version 2.4.1.
>>
>> Thanks,
>>
>> Stefan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: OutOfMemoryError using IndexWriter

Posted by stefan <st...@intermediate.de>.
Hi,

I do not set a RAM Buffer size, I assume default is 16MB.
My server runs with 80MB heap size, before starting lucene about 50MB is used. In a production environment I run in this problem with heap size set to 750MB with no other activity on the server (nighttime), though since then I diagnosed some problem with my code as well. I just reproduced it with 80MB but I guess I can reproduce it with 100MB heap as well, just takes longer.

Here is the stack, I keep the dump for
java.lang.OutOfMemoryError: Java heap space
Dumping heap to c:\auto_heap_intern.prof ...
Heap dump file created [97173841 bytes in 3.534 secs]
ERROR lucene.SearchManager       - Failure in index daemon: 
java.lang.OutOfMemoryError: Java heap space
        at java.util.HashSet.<init>(HashSet.java:86)
        at org.apache.lucene.index.DocumentsWriter.initFlushState(DocumentsWriter.java:540)
        at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
        at org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1703)
        at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3534)
        at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
        at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)

Heap Histogram shows:
class org.apache.lucene.index.FreqProxTermsWriter$PostingList 	116736 (instances) 	3268608 (size)

Well, something I should do differently ?

Stefan

-----Ursprüngliche Nachricht-----
Von: Michael McCandless [mailto:lucene@mikemccandless.com]
Gesendet: Mi 24.06.2009 10:48
An: java-user@lucene.apache.org
Betreff: Re: OutOfMemoryError using IndexWriter
 
How large is the RAM buffer that you're giving IndexWriter?  How large
a heap size do you give to JVM?

Can you post one of the OOM exceptions you're hitting?

Mike

On Wed, Jun 24, 2009 at 4:08 AM, stefan<st...@intermediate.de> wrote:
> Hi,
>
> I am using Lucene 2.4.1 to index a database with less than a million records. The resulting index is about 50MB in size.
> I keep getting an OutOfMemory Error if I re-use the same IndexWriter to index the complete database. This is though
> recommended in the performance hints.
> What I now do is, every 10000 Objects I close the index (and every 50 close actions optimize it) and create a new
> IndexWriter to continue. This process works fine, but to me seems hardly the recommended way to go.
> I've been using jhat/jmap as well as Netbeans profiler and am fairly sure that this is a problem related to Lucene.
>
> Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory postings but they all seem closed in Version 2.4.1.
>
> Thanks,
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: OutOfMemoryError using IndexWriter

Posted by Michael McCandless <lu...@mikemccandless.com>.
How large is the RAM buffer that you're giving IndexWriter?  How large
a heap size do you give to JVM?

Can you post one of the OOM exceptions you're hitting?

Mike

On Wed, Jun 24, 2009 at 4:08 AM, stefan<st...@intermediate.de> wrote:
> Hi,
>
> I am using Lucene 2.4.1 to index a database with less than a million records. The resulting index is about 50MB in size.
> I keep getting an OutOfMemory Error if I re-use the same IndexWriter to index the complete database. This is though
> recommended in the performance hints.
> What I now do is, every 10000 Objects I close the index (and every 50 close actions optimize it) and create a new
> IndexWriter to continue. This process works fine, but to me seems hardly the recommended way to go.
> I've been using jhat/jmap as well as Netbeans profiler and am fairly sure that this is a problem related to Lucene.
>
> Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory postings but they all seem closed in Version 2.4.1.
>
> Thanks,
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: OutOfMemoryError using IndexWriter

Posted by stefan <st...@intermediate.de>.
Hi,

there seems to be a little misunderstanding. The index will only be optimized if the IndexWriter is to be closed and then only with a probability of 2% (meaning occasionaly).
In other words, I only close the IndexWriter (and thus optimize) to avoid the OOMError. 
When I keep the same IndexWriter for the complete indexing operation, I do not call optimize but get an OOMError.

Stefan



-----Urspr�ngliche Nachricht-----
Von: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Gesendet: Mi 24.06.2009 14:22
An: java-user@lucene.apache.org
Betreff: Re: OutOfMemoryError using IndexWriter
 

Hi Stefan,

While not directly th source of your problem, I have a feeling you are optimizing too frequently (and wasting time/CPU by doing so).  Is there a reason you optimize so often?  Try optimizing only at the end, when you know you won't be adding any more documents to the index for a while.

I'm also wondering why you close/open the IndeWriter, as it looks like you are doing batch indexing.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: stefan <st...@intermediate.de>
> To: java-user@lucene.apache.org
> Sent: Wednesday, June 24, 2009 4:08:43 AM
> Subject: OutOfMemoryError using IndexWriter
> 
> Hi,
> 
> I am using Lucene 2.4.1 to index a database with less than a million records. 
> The resulting index is about 50MB in size.
> I keep getting an OutOfMemory Error if I re-use the same IndexWriter to index 
> the complete database. This is though 
> recommended in the performance hints.
> What I now do is, every 10000 Objects I close the index (and every 50 close 
> actions optimize it) and create a new
> IndexWriter to continue. This process works fine, but to me seems hardly the 
> recommended way to go.
> I've been using jhat/jmap as well as Netbeans profiler and am fairly sure that 
> this is a problem related to Lucene.
> 
> Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory postings but 
> they all seem closed in Version 2.4.1.
> 
> Thanks,
> 
> Stefan
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




Re: OutOfMemoryError using IndexWriter

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Stefan,

While not directly th source of your problem, I have a feeling you are optimizing too frequently (and wasting time/CPU by doing so).  Is there a reason you optimize so often?  Try optimizing only at the end, when you know you won't be adding any more documents to the index for a while.

I'm also wondering why you close/open the IndeWriter, as it looks like you are doing batch indexing.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: stefan <st...@intermediate.de>
> To: java-user@lucene.apache.org
> Sent: Wednesday, June 24, 2009 4:08:43 AM
> Subject: OutOfMemoryError using IndexWriter
> 
> Hi,
> 
> I am using Lucene 2.4.1 to index a database with less than a million records. 
> The resulting index is about 50MB in size.
> I keep getting an OutOfMemory Error if I re-use the same IndexWriter to index 
> the complete database. This is though 
> recommended in the performance hints.
> What I now do is, every 10000 Objects I close the index (and every 50 close 
> actions optimize it) and create a new
> IndexWriter to continue. This process works fine, but to me seems hardly the 
> recommended way to go.
> I've been using jhat/jmap as well as Netbeans profiler and am fairly sure that 
> this is a problem related to Lucene.
> 
> Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory postings but 
> they all seem closed in Version 2.4.1.
> 
> Thanks,
> 
> Stefan
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org