You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Stanislav Jordanov <st...@sirma.bg> on 2005/06/16 15:19:37 UTC

Poor memory performance over a large index

We are in a similar situatuin.
The index contains about 1,000,000 docs and its total size is 31G (note: 
Gigabytes, not Megabytes).
The problem is not the search speed - it is the memory usage.
Opening the first IndexSearcher and running a query consumes about 325M 
of RAM
Strange, but opening a second IndexSearcher and running another query 
consumes another 560M of RAM.
In our case the results are always sorted by some column(s).
The app is supposed to be a multithreaded multiuser environment.
At the begining the design was that each user session has its own 
IndexSearcher.
But later we've made an observation that the first time an IndexSearcher 
is sorting on specific column, it takes significantly more time then the 
next sorts on the same column.
This forms a performance penalty when changing the sort column. That's 
why we've decided to keep a distinct opened IndexSearcher for each 
possible sort column.
And each session upon executing its query will obtain the "right" 
IndexSearcher that is quick on sorting by that specific column.
The problem is that (having in mind the abovementioned 325M & 540M) the 
memory gets exhausted too quickly which really leaves no room for 
multiple users running queries simultaneously.

Any idea that may help will be appreciated :)

Regards
Stenly

Erik Hatcher wrote:

>
> On Jun 16, 2005, at 4:08 AM, JM Tinghir wrote:
>
>> I have a 25 Mb index and was wondering if it would be better to divide
>> it in about 10 indexes and search in it with MutliSearcher.
>> Would searching be faster this way?
>> The indexing would be faster I guess, as it is getting slower and
>> slower while indexes get bigger.
>> But searching?
>
>
> I think keeping it in one index is preferable in this situation.   
> Perhaps you need to optimize the index?  Could you qualify a bit more  
> about what is slow?  What types of queries?  How "slow" are they?
>
>     Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Poor memory performance over a large index

Posted by Chris Hostetter <ho...@fucit.org>.

: Strange, but opening a second IndexSearcher and running another query
: consumes another 560M of RAM.
: In our case the results are always sorted by some column(s).
: The app is supposed to be a multithreaded multiuser environment.
: At the begining the design was that each user session has its own
: IndexSearcher.
: But later we've made an observation that the first time an IndexSearcher
: is sorting on specific column, it takes significantly more time then the
: next sorts on the same column.
: This forms a performance penalty when changing the sort column. That's
: why we've decided to keep a distinct opened IndexSearcher for each
: possible sort column.
: And each session upon executing its query will obtain the "right"
: IndexSearcher that is quick on sorting by that specific column.

wow.  this seems way more complicated then it needs to be.

the reason you see the memory footprint go up so much with each search
that uses a sort on a field, is because of the FieldCache.  You shoudl
search arround for an explanation/description of what/why it is and how it
works, but the bottomline is that time/memory cost is only paid once per
field you search on *per* IndexReader ... so opening a seperate
IndexSearcher/IndexReader per user is definitely not hte best way to go,
but open a seperate IndexSearcher per sort field should also be
unneccessary.  you should be able to use *one* IndexSearcher for all
searches, with the same memory footprint, and reduced complexity.

: The problem is that (having in mind the abovementioned 325M & 540M) the
: memory gets exhausted too quickly which really leaves no room for
: multiple users running queries simultaneously.

the only advice i have is to increase your heap size, and minimize the
number of IndexSearchers.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org