You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mindaugas Žakšauskas <mi...@gmail.com> on 2009/09/21 16:32:05 UTC

Memory consumed by IndexSearcher

Hi,

I was wondering what would be sensible amount of memory IndexSearcher
can consume? In my application we do retain reference to it for
quicker searches; however I have become a bit worried for it being a
memory hog. We are using Lucene 2.4.0 on 8 CPU Linux SMP box; JVM is
Sun's 1.6.0_14 64-Bit Server VM.

I am asking because I have ended up with IndexSearcher having Retained
size [1] of 145M. All of this memory is being eaten by
IndexSearcher::reader::subReaders[]. The reader is MultiSegmentReader
and all subReaders are SegmentReader. My memory dump showed subReaders
array having size of 37 SegmentReaders, 2 to 5 M each. I can send
YouKit screenshot if anyone's interested.

All of that should be viewed in the light of index size on the disk,
which is only 22M.

I appreciate that all of this memory can be used for legitimate
purposes; however is there a way to know when does it go over sensible
limit? Can there be a "sensible" limit at all? Also, is it possible to
set the physical boundary the IndexSearcher would never go over?

Thanks in advance for all answers.

Regards,
Mindaugas

[1] http://www.yourkit.com/docs/80/help/sizes.jsp

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Memory consumed by IndexSearcher

Posted by Karl Wettin <ka...@gmail.com>.
23 sep 2009 kl. 17.55 skrev Mindaugas Žakšauskas:
>
> I was kind of hinting on the resource planning. Every decent
> enterprise application, apart from other things, has to provide its
> memory requirements, and my point was - if it uses memory, how much of
> it needs to be allocated? What are the boundaries?

There is no function available, if that is what you are asking for?  
How much memory your search application will consume depends on how  
you have built you index and what features you are using.
>
>> How many fields,
> 80
>
>> do you use norms
> yes
>
>> how many documents do you have
> 786

This is a rather small index. I'm surprised it takes 145MB RAM if the  
file size of the index is 22MB. Is this really from just opening an  
FSDirectory? Or is it a RAMDirectory? My guess is the latter. If so,  
did you then benchmark using FSDirectory instead? OS level FS cache  
does a pretty good job. This is sort of leaving your question, but if  
you have been using a RAMDirectory to gain speed, then your index is  
probably small enough to make use of InstantiatedIndex.

> Luke says:
> Has deletions? / Optimized? Yes (1614) / No

This says that you have quite some overhead of deleted documents, some  
2x of the available data in your index is just sitting around for no  
use. Thus if you are using a RAMDirectory you should be able to cut  
the amount of memory consumed to 1/3 by optimizing the index.

You still haven't said that much about your application though so I'm  
just guessing what the problem might be.


      karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Memory consumed by IndexSearcher

Posted by Karl Wettin <ka...@gmail.com>.
23 sep 2009 kl. 17.55 skrev Mindaugas Žakšauskas:
>>
> Luke says:
> Has deletions? / Optimized? Yes (1614) / No

Very quick response, try optimizing your index and see what happends.

I'll get back to you unless someone beats me to it.


      karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Memory consumed by IndexSearcher

Posted by Mindaugas Žakšauskas <mi...@gmail.com>.
Hi Karl,

On Tue, Sep 22, 2009 at 6:58 PM, Karl Wettin <ka...@gmail.com> wrote:
> <..> Thing that
> consume the most memory is probably field norms (8 bits per field and
> document unless omitted) and flyweighted terms (String#interal), things you
> can't really do that much about.

I was kind of hinting on the resource planning. Every decent
enterprise application, apart from other things, has to provide its
memory requirements, and my point was - if it uses memory, how much of
it needs to be allocated? What are the boundaries?

> But it is hard to say for sure when you've said so little about what your
> index really looks like.

> How many fields,
80

> do you use norms
yes

> how many documents do you have
786

> how many unique terms are there, etc?
Luke says:
Has deletions? / Optimized? Yes (1614) / No
Index format: -7 (Lucene 2.4)
Index functionality: lock-less, single norms, shared doc
store,checksum,del count, omitTf
TermInfos index divisor: 1
Directory implementation: org.apache.lucene.store.FSDirectory

Hope this helps.

Regards,
Mindaugas

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Memory consumed by IndexSearcher

Posted by Karl Wettin <ka...@gmail.com>.
Hi Mindaugas,

it is - as you sort of point out - the readers associated with your  
searcher that consumes the memory, and not so much the searcher it  
self. Thing that consume the most memory is probably field norms (8  
bits per field and document unless omitted) and flyweighted terms  
(String#interal), things you can't really do that much about.

But it is hard to say for sure when you've said so little about what  
your index really looks like. How many fields, do you use norms, how  
many documents do you have, how many unique terms are there, etc?

Another thing that can consume a lot of memory is sorting. Not sure if  
that data is bound to the searcher or not though.



     karl

21 sep 2009 kl. 16.32 skrev Mindaugas Žakšauskas:

> Hi,
>
> I was wondering what would be sensible amount of memory IndexSearcher
> can consume? In my application we do retain reference to it for
> quicker searches; however I have become a bit worried for it being a
> memory hog. We are using Lucene 2.4.0 on 8 CPU Linux SMP box; JVM is
> Sun's 1.6.0_14 64-Bit Server VM.
>
> I am asking because I have ended up with IndexSearcher having Retained
> size [1] of 145M. All of this memory is being eaten by
> IndexSearcher::reader::subReaders[]. The reader is MultiSegmentReader
> and all subReaders are SegmentReader. My memory dump showed subReaders
> array having size of 37 SegmentReaders, 2 to 5 M each. I can send
> YouKit screenshot if anyone's interested.
>
> All of that should be viewed in the light of index size on the disk,
> which is only 22M.
>
> I appreciate that all of this memory can be used for legitimate
> purposes; however is there a way to know when does it go over sensible
> limit? Can there be a "sensible" limit at all? Also, is it possible to
> set the physical boundary the IndexSearcher would never go over?
>
> Thanks in advance for all answers.
>
> Regards,
> Mindaugas
>
> [1] http://www.yourkit.com/docs/80/help/sizes.jsp
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org