You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucenenet.apache.org by "Steven (JIRA)" <ji...@apache.org> on 2012/05/01 12:32:50 UTC

[jira] [Created] (LUCENENET-488) Can't open IndexReader, get OutOFMemory Exception

Steven created LUCENENET-488:
--------------------------------

             Summary: Can't open IndexReader, get OutOFMemory Exception
                 Key: LUCENENET-488
                 URL: https://issues.apache.org/jira/browse/LUCENENET-488
             Project: Lucene.Net
          Issue Type: Bug
          Components: Lucene.Net Core
    Affects Versions: Lucene.Net 2.9.4g
         Environment: Windows server 2008R2
            Reporter: Steven


Have build a large database with ~1Bn records (2 items per document) it has size 200GB on disk. I managed to write the indexe by chunking into 100,000 blocks as I ended up with some threading issues (another bug submission). Anyway the index is built but I can't open it and get a memory exception (process explorer gets to 1.5GB allocated before it dies but not sure how reliable that is, but do know there is plenty more RAM left on the box).
Stack trace below:

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was
 thrown.
   at Lucene.Net.Index.TermInfosReader..ctor(Directory dir, String seg, FieldInf
os fis, Int32 readBufferSize, Int32 indexDivisor)
   at Lucene.Net.Index.SegmentReader.CoreReaders..ctor(SegmentReader origInstanc
e, Directory dir, SegmentInfo si, Int32 readBufferSize, Int32 termsIndexDivisor)

   at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, Directory dir, Segmen
tInfo si, Int32 readBufferSize, Boolean doOpenStores, Int32 termInfosIndexDiviso
r)
   at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, SegmentInfo si, Int32
 termInfosIndexDivisor)
   at Lucene.Net.Index.DirectoryReader..ctor(Directory directory, SegmentInfos s
is, IndexDeletionPolicy deletionPolicy, Boolean readOnly, Int32 termInfosIndexDi
visor)
   at Lucene.Net.Index.DirectoryReader.<>c__DisplayClass1.<Open>b__0(String segm
entFileName)
   at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit)
   at Lucene.Net.Index.DirectoryReader.Open(Directory directory, IndexDeletionPo
licy deletionPolicy, IndexCommit commit, Boolean readOnly, Int32 termInfosIndexD
ivisor)
   at Lucene.Net.Index.IndexReader.Open(String path, Boolean readOnly)
   at Lucene.Net.Demo.SearchFiles.Main(String[] args)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LUCENENET-488) Can't open IndexReader, get OutOFMemory Exception

Posted by "Simon Svensson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENENET-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265747#comment-13265747 ] 

Simon Svensson commented on LUCENENET-488:
------------------------------------------

The 1.5 GiB limit sounds like you're executing a 32bit application. Is this correct? 

Does it work if you're calling the overload of IndexReader.Open which accepts a termInfosIndexDivisor directly? (You can pass null for deletion policy to use the default deletion policy.) The default termInfosIndexDivisor is one, increasing it will decrease the amount of memory required. This will slow down some term-related operations against the index, but it sounds better than not being able to open it at all.

There are some information about what data is loaded into memory at http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-searching.html
                
> Can't open IndexReader, get OutOFMemory Exception
> -------------------------------------------------
>
>                 Key: LUCENENET-488
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-488
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 2.9.4g
>         Environment: Windows server 2008R2
>            Reporter: Steven
>
> Have build a large database with ~1Bn records (2 items per document) it has size 200GB on disk. I managed to write the indexe by chunking into 100,000 blocks as I ended up with some threading issues (another bug submission). Anyway the index is built but I can't open it and get a memory exception (process explorer gets to 1.5GB allocated before it dies but not sure how reliable that is, but do know there is plenty more RAM left on the box).
> Stack trace below:
> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was
>  thrown.
>    at Lucene.Net.Index.TermInfosReader..ctor(Directory dir, String seg, FieldInf
> os fis, Int32 readBufferSize, Int32 indexDivisor)
>    at Lucene.Net.Index.SegmentReader.CoreReaders..ctor(SegmentReader origInstanc
> e, Directory dir, SegmentInfo si, Int32 readBufferSize, Int32 termsIndexDivisor)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, Directory dir, Segmen
> tInfo si, Int32 readBufferSize, Boolean doOpenStores, Int32 termInfosIndexDiviso
> r)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, SegmentInfo si, Int32
>  termInfosIndexDivisor)
>    at Lucene.Net.Index.DirectoryReader..ctor(Directory directory, SegmentInfos s
> is, IndexDeletionPolicy deletionPolicy, Boolean readOnly, Int32 termInfosIndexDi
> visor)
>    at Lucene.Net.Index.DirectoryReader.<>c__DisplayClass1.<Open>b__0(String segm
> entFileName)
>    at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit)
>    at Lucene.Net.Index.DirectoryReader.Open(Directory directory, IndexDeletionPo
> licy deletionPolicy, IndexCommit commit, Boolean readOnly, Int32 termInfosIndexD
> ivisor)
>    at Lucene.Net.Index.IndexReader.Open(String path, Boolean readOnly)
>    at Lucene.Net.Demo.SearchFiles.Main(String[] args)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (LUCENENET-488) Can't open IndexReader, get OutOFMemory Exception

Posted by "Prescott Nasser (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENENET-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prescott Nasser closed LUCENENET-488.
-------------------------------------

    
> Can't open IndexReader, get OutOFMemory Exception
> -------------------------------------------------
>
>                 Key: LUCENENET-488
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-488
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 2.9.4g
>         Environment: Windows server 2008R2
>            Reporter: Steven
>
> Have build a large database with ~1Bn records (2 items per document) it has size 200GB on disk. I managed to write the indexe by chunking into 100,000 blocks as I ended up with some threading issues (another bug submission). Anyway the index is built but I can't open it and get a memory exception (process explorer gets to 1.5GB allocated before it dies but not sure how reliable that is, but do know there is plenty more RAM left on the box).
> Stack trace below:
> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was
>  thrown.
>    at Lucene.Net.Index.TermInfosReader..ctor(Directory dir, String seg, FieldInf
> os fis, Int32 readBufferSize, Int32 indexDivisor)
>    at Lucene.Net.Index.SegmentReader.CoreReaders..ctor(SegmentReader origInstanc
> e, Directory dir, SegmentInfo si, Int32 readBufferSize, Int32 termsIndexDivisor)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, Directory dir, Segmen
> tInfo si, Int32 readBufferSize, Boolean doOpenStores, Int32 termInfosIndexDiviso
> r)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, SegmentInfo si, Int32
>  termInfosIndexDivisor)
>    at Lucene.Net.Index.DirectoryReader..ctor(Directory directory, SegmentInfos s
> is, IndexDeletionPolicy deletionPolicy, Boolean readOnly, Int32 termInfosIndexDi
> visor)
>    at Lucene.Net.Index.DirectoryReader.<>c__DisplayClass1.<Open>b__0(String segm
> entFileName)
>    at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit)
>    at Lucene.Net.Index.DirectoryReader.Open(Directory directory, IndexDeletionPo
> licy deletionPolicy, IndexCommit commit, Boolean readOnly, Int32 termInfosIndexD
> ivisor)
>    at Lucene.Net.Index.IndexReader.Open(String path, Boolean readOnly)
>    at Lucene.Net.Demo.SearchFiles.Main(String[] args)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LUCENENET-488) Can't open IndexReader, get OutOFMemory Exception

Posted by "Steven (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENENET-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266213#comment-13266213 ] 

Steven commented on LUCENENET-488:
----------------------------------

I do agree, and the info above is very helpful. I guess now at least others will find the issue and its resolution should they be faced with the same issue so thanks for your help.
                
> Can't open IndexReader, get OutOFMemory Exception
> -------------------------------------------------
>
>                 Key: LUCENENET-488
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-488
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 2.9.4g
>         Environment: Windows server 2008R2
>            Reporter: Steven
>
> Have build a large database with ~1Bn records (2 items per document) it has size 200GB on disk. I managed to write the indexe by chunking into 100,000 blocks as I ended up with some threading issues (another bug submission). Anyway the index is built but I can't open it and get a memory exception (process explorer gets to 1.5GB allocated before it dies but not sure how reliable that is, but do know there is plenty more RAM left on the box).
> Stack trace below:
> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was
>  thrown.
>    at Lucene.Net.Index.TermInfosReader..ctor(Directory dir, String seg, FieldInf
> os fis, Int32 readBufferSize, Int32 indexDivisor)
>    at Lucene.Net.Index.SegmentReader.CoreReaders..ctor(SegmentReader origInstanc
> e, Directory dir, SegmentInfo si, Int32 readBufferSize, Int32 termsIndexDivisor)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, Directory dir, Segmen
> tInfo si, Int32 readBufferSize, Boolean doOpenStores, Int32 termInfosIndexDiviso
> r)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, SegmentInfo si, Int32
>  termInfosIndexDivisor)
>    at Lucene.Net.Index.DirectoryReader..ctor(Directory directory, SegmentInfos s
> is, IndexDeletionPolicy deletionPolicy, Boolean readOnly, Int32 termInfosIndexDi
> visor)
>    at Lucene.Net.Index.DirectoryReader.<>c__DisplayClass1.<Open>b__0(String segm
> entFileName)
>    at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit)
>    at Lucene.Net.Index.DirectoryReader.Open(Directory directory, IndexDeletionPo
> licy deletionPolicy, IndexCommit commit, Boolean readOnly, Int32 termInfosIndexD
> ivisor)
>    at Lucene.Net.Index.IndexReader.Open(String path, Boolean readOnly)
>    at Lucene.Net.Demo.SearchFiles.Main(String[] args)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LUCENENET-488) Can't open IndexReader, get OutOFMemory Exception

Posted by "Simon Svensson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENENET-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265916#comment-13265916 ] 

Simon Svensson commented on LUCENENET-488:
------------------------------------------

The following may be off since I don't know the inner technical workings of Lucene.Net.

All terms in your index is read into an in-memory index when opening an IndexReader. The termInfosIndexDivisor tells the IndexReader instance to read every n-th term into this index. The default value, 1, will cause every term to be loaded into memory. Using termIndexIndexDivisor=2 means that you'll read every second term into memory, theoretically halving the required memory size. Your value, 10, would only consume a tenth of the memory compared to termIndexDivisor=1.

This comes to a price; as 9 out of 10 terms are not cached in memory they take longer time to retrieve. This is done in many cases, like a new TermQuery("f", "test"). It needs to seek to the indexed term, then iterate forward until it matches the correct term. This could be, if "teargas" was the indexed term; teargas > technicians > tegument > teleconference > temporal > tenotomy > teocalli > terbium > test. Instead of being able to directly seek to the term, we now seek to a term before, and iterate the list for another 8 terms. (It would still go faster than the time it took for me to find odd example words...)

I've never measured this, but I doubt that low numbers will cause much trouble. Any term except "teargas" would need to read the term information from disk, and this disk read will [probably] end up in the file system cache. I can see a problem if you have numbers high enough causing a second disk read, but at what value of termInfosIndexDivisor this happens is system-dependent. The size of the disk reads, the amount of data per term, etc, would affect this. I guess you could use a low-level monitoring tool (Process Monitor?) to see every read if you really want to find the "perfect" number.

I believe this bug report can be closed as invalid; it was a case of default values that did not work out for 200 GiB indexes. Do you agree on this, Steven?
                
> Can't open IndexReader, get OutOFMemory Exception
> -------------------------------------------------
>
>                 Key: LUCENENET-488
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-488
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 2.9.4g
>         Environment: Windows server 2008R2
>            Reporter: Steven
>
> Have build a large database with ~1Bn records (2 items per document) it has size 200GB on disk. I managed to write the indexe by chunking into 100,000 blocks as I ended up with some threading issues (another bug submission). Anyway the index is built but I can't open it and get a memory exception (process explorer gets to 1.5GB allocated before it dies but not sure how reliable that is, but do know there is plenty more RAM left on the box).
> Stack trace below:
> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was
>  thrown.
>    at Lucene.Net.Index.TermInfosReader..ctor(Directory dir, String seg, FieldInf
> os fis, Int32 readBufferSize, Int32 indexDivisor)
>    at Lucene.Net.Index.SegmentReader.CoreReaders..ctor(SegmentReader origInstanc
> e, Directory dir, SegmentInfo si, Int32 readBufferSize, Int32 termsIndexDivisor)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, Directory dir, Segmen
> tInfo si, Int32 readBufferSize, Boolean doOpenStores, Int32 termInfosIndexDiviso
> r)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, SegmentInfo si, Int32
>  termInfosIndexDivisor)
>    at Lucene.Net.Index.DirectoryReader..ctor(Directory directory, SegmentInfos s
> is, IndexDeletionPolicy deletionPolicy, Boolean readOnly, Int32 termInfosIndexDi
> visor)
>    at Lucene.Net.Index.DirectoryReader.<>c__DisplayClass1.<Open>b__0(String segm
> entFileName)
>    at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit)
>    at Lucene.Net.Index.DirectoryReader.Open(Directory directory, IndexDeletionPo
> licy deletionPolicy, IndexCommit commit, Boolean readOnly, Int32 termInfosIndexD
> ivisor)
>    at Lucene.Net.Index.IndexReader.Open(String path, Boolean readOnly)
>    at Lucene.Net.Demo.SearchFiles.Main(String[] args)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (LUCENENET-488) Can't open IndexReader, get OutOFMemory Exception

Posted by "Prescott Nasser (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENENET-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prescott Nasser resolved LUCENENET-488.
---------------------------------------

    Resolution: Invalid

Intended - see comments by Sven for more details
                
> Can't open IndexReader, get OutOFMemory Exception
> -------------------------------------------------
>
>                 Key: LUCENENET-488
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-488
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 2.9.4g
>         Environment: Windows server 2008R2
>            Reporter: Steven
>
> Have build a large database with ~1Bn records (2 items per document) it has size 200GB on disk. I managed to write the indexe by chunking into 100,000 blocks as I ended up with some threading issues (another bug submission). Anyway the index is built but I can't open it and get a memory exception (process explorer gets to 1.5GB allocated before it dies but not sure how reliable that is, but do know there is plenty more RAM left on the box).
> Stack trace below:
> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was
>  thrown.
>    at Lucene.Net.Index.TermInfosReader..ctor(Directory dir, String seg, FieldInf
> os fis, Int32 readBufferSize, Int32 indexDivisor)
>    at Lucene.Net.Index.SegmentReader.CoreReaders..ctor(SegmentReader origInstanc
> e, Directory dir, SegmentInfo si, Int32 readBufferSize, Int32 termsIndexDivisor)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, Directory dir, Segmen
> tInfo si, Int32 readBufferSize, Boolean doOpenStores, Int32 termInfosIndexDiviso
> r)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, SegmentInfo si, Int32
>  termInfosIndexDivisor)
>    at Lucene.Net.Index.DirectoryReader..ctor(Directory directory, SegmentInfos s
> is, IndexDeletionPolicy deletionPolicy, Boolean readOnly, Int32 termInfosIndexDi
> visor)
>    at Lucene.Net.Index.DirectoryReader.<>c__DisplayClass1.<Open>b__0(String segm
> entFileName)
>    at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit)
>    at Lucene.Net.Index.DirectoryReader.Open(Directory directory, IndexDeletionPo
> licy deletionPolicy, IndexCommit commit, Boolean readOnly, Int32 termInfosIndexD
> ivisor)
>    at Lucene.Net.Index.IndexReader.Open(String path, Boolean readOnly)
>    at Lucene.Net.Demo.SearchFiles.Main(String[] args)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LUCENENET-488) Can't open IndexReader, get OutOFMemory Exception

Posted by "Steven (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENENET-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265773#comment-13265773 ] 

Steven commented on LUCENENET-488:
----------------------------------

Hi Simon, thanks very much, set the option to 10 (have no idea what that means but it works) the reader open in about 4 seconds but the search is still hugely impressive (300ms to search through 1bn records and return the first 10).
I will try to build a native 64bit version on the server itself (my development box is only 32 which might be the problem) and let you know how I get on.
Thanks again, can't believe you guys do this for free, I pay millions for products that aren't any where near as good!
                
> Can't open IndexReader, get OutOFMemory Exception
> -------------------------------------------------
>
>                 Key: LUCENENET-488
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-488
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 2.9.4g
>         Environment: Windows server 2008R2
>            Reporter: Steven
>
> Have build a large database with ~1Bn records (2 items per document) it has size 200GB on disk. I managed to write the indexe by chunking into 100,000 blocks as I ended up with some threading issues (another bug submission). Anyway the index is built but I can't open it and get a memory exception (process explorer gets to 1.5GB allocated before it dies but not sure how reliable that is, but do know there is plenty more RAM left on the box).
> Stack trace below:
> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was
>  thrown.
>    at Lucene.Net.Index.TermInfosReader..ctor(Directory dir, String seg, FieldInf
> os fis, Int32 readBufferSize, Int32 indexDivisor)
>    at Lucene.Net.Index.SegmentReader.CoreReaders..ctor(SegmentReader origInstanc
> e, Directory dir, SegmentInfo si, Int32 readBufferSize, Int32 termsIndexDivisor)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, Directory dir, Segmen
> tInfo si, Int32 readBufferSize, Boolean doOpenStores, Int32 termInfosIndexDiviso
> r)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, SegmentInfo si, Int32
>  termInfosIndexDivisor)
>    at Lucene.Net.Index.DirectoryReader..ctor(Directory directory, SegmentInfos s
> is, IndexDeletionPolicy deletionPolicy, Boolean readOnly, Int32 termInfosIndexDi
> visor)
>    at Lucene.Net.Index.DirectoryReader.<>c__DisplayClass1.<Open>b__0(String segm
> entFileName)
>    at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit)
>    at Lucene.Net.Index.DirectoryReader.Open(Directory directory, IndexDeletionPo
> licy deletionPolicy, IndexCommit commit, Boolean readOnly, Int32 termInfosIndexD
> ivisor)
>    at Lucene.Net.Index.IndexReader.Open(String path, Boolean readOnly)
>    at Lucene.Net.Demo.SearchFiles.Main(String[] args)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira