You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Artem Vasiliev <ar...@gmail.com> on 2006/04/05 06:59:15 UTC

Re[4]: OutOfMemory with search(Query, Sort)

Hello Hoss,

Thanks for your answer, you're right, filepathes are pretty much
unique. Anyway I don't want this total-field-cache-loading situation occur
in any circumstances - it's too expensive. My app usually crawls while
user searches are performed. Crawl involves additions and deletions so
IndexSearcher get closed relatively frequently. Seems like Lucene
would reload the whole field cache for each new IndexSearcher, which
would be a big hit anyway. So I'll try FieldCache overriding solution
proposed by you and Yonik and may be commit it to Lucene as a patch.

Btw do I understand right that concrete FieldCache class isn't pluggable
at Lucene at the moment?

: >> sort by filePath field which can be 100 bytes at average meaning 400M
: >> RAM for the cache
CH> :
CH> : Well, it's probably not quite that bad...

CH> yeah, but in his case he's dealing with filepaths -- i'm guessing that
CH> each document represents a file, and no two files will have the same path.

CH> some benefit may be gained in spliting the filepath field up into a
CH> dirpath field and a filename field, and then sortinging on "dirpath,
CH> filename" .. this should reduce the size quite a bit if the number of

-- 
Best regards,
 Artem

http://sharehound.sourceforge.net sharehound, the open source filesystems indexer


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re[4]: OutOfMemory with search(Query, Sort)

Posted by Chris Hostetter <ho...@fucit.org>.

: Thanks for your answer, you're right, filepathes are pretty much
: unique. Anyway I don't want this total-field-cache-loading situation occur
: in any circumstances - it's too expensive. My app usually crawls while
: user searches are performed. Crawl involves additions and deletions so
: IndexSearcher get closed relatively frequently. Seems like Lucene
: would reload the whole field cache for each new IndexSearcher, which
: would be a big hit anyway. So I'll try FieldCache overriding solution
: proposed by you and Yonik and may be commit it to Lucene as a patch.

it will, but you can structure your app so that you don't re-open your
IndexSearcher for every query -- just do it on a periodic basis (ie: "if
N time elapsed since last open, and index version has increased, reopen
searcher; sleep S time")

: Btw do I understand right that concrete FieldCache class isn't pluggable
: at Lucene at the moment?

it's definitley pluggable, just assign the implimentation you want to use
to FieldCache.DEFAULT and all coe will use it, but anything that calls
FieldCache.DEFAULT.getStringIndex(...) may expect that the string array
will be populated, so instead of replacing it the FieldCache isntance
completley, you can impliment your own SortField that uses
FieldCache.getCustom to build a cache that only contains the index ints.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org