You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Ivan Vasilev <iv...@sirma.bg> on 2007/04/06 13:09:38 UTC

Out of memory exception for big indexes

Hi All,

I have the following problem - we have OutOfMemoryException when 
seraching on the indexes that are of size 20 - 40 GB and contain 10 - 15 
million docs.
When we make searches we perform query that match all the results but we 
DO NOT fetch all the results - we fetch 100 of them. We also make 
sorting by using the class Sort and we really need result to be sorted 
on a field that is randomly defined by the user.
So my questions are:
1) Have Lucene some restrictions on index size on which it can perform 
searches?
2) Is there some approach to estimate beforehand the RAM that will use 
Lucene for sertain query? I mean on what exactly depends this memory 
usage - on index size, on docs stored in the index, on size of this docs...
3) Is there some approach to controll the used RAM. For example when 
searching not to exceed 1GB of used memory?
4) Is there some spcial approach to proceeding with such big indexes (we 
expect in near future even 60 -80 GB indexes).


Best Regards,
Ivan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Fwd: Re[2]: Out of memory exception for big indexes

Posted by Artem <ab...@gmail.com>.

Hello Nilesh and all!

NB> This seems like a very useful patch. Our application searches over 50
NB> million doc in a 40GB index. We only have simple conjunctive queries
NB> on a single field. Currently, the command line search program that
NB> prints top-10 results requires at least 200mb memory. Our web
NB> application, that searches the same index crashes with OOM when there
NB> are more than 10-12 concurrent requests (heap size set to 3GB). Will
NB> this patch help in such a situation?

I must note that my patch only helps in lucene-OOM situations related to
_sorted_ queries. If this is your case than I think yes it will help.

In my app currently index is not so big, only 1mln docs. With the patch applied
sample query giving first 30 of 120,000 sorted results made memory consumption
jump from 18M to 20M according to jconsole.

NB> It seems that there are some issues with this patch and that was the
NB> reason it is not yet in the main source tree. Can someone please
NB> summerize what are the downsides of using such an approach. It will be
NB> really good if Lucene had it in main source tree and a flag to turn ON
NB> or OFF this feature.

First there's performance cost (for second and further queries with the
same IndexSearcher). In default implementation all the index values of sorted
field are cached during the first sorted search - this takes memory and time;
but next queries run fast if there still some memory left. My implementation
doesn't cache field values but loads them from respective documents on the fly -
so it's slower but takes less memory. The query mentioned took about 3s (with
rather small sorted fields values - about 20-100 chars).
There's a limitation also - my implementation requires sorted field to be
"stored" in index (Field.Store.YES in doc.add())

NB> Bublic, can you tell me what exactly I need to do if I want to use this patch?

You can include StoredFieldSortFactory class source file into your sources and
then use StoredFieldSortFactory.create(sortFieldName, sortDescending) to get
Sort object for sorting query.
StoredFieldSortFactory source file can be extracted from LUCENE-769 patch or
from sharehound sources: http://sharehound.cvs.sourceforge.net/*checkout*/sharehound/jNetCrawler/src/java/org/apache/lucene/search/StoredFieldSortFactory.java

Regards,
Artem

NB> thanks
NB> Nilesh

NB> On 4/6/07, Bublic Online <ab...@gmail.com> wrote:
>> Hi Ivan, Chris and all!
>>
>> I'm that contributor of LUCENE-769 and I recommend it too :)
>> OutOfMemory error was one of main reasons for me to make it.
>>
>> Regards,
>> Artem Vasiliev
>>
>> On 4/6/07, Chris Hostetter <ho...@fucit.org> wrote:
>> >
>> >
>> > : The problem I suspect is the sorting. As I understand, Lucene
>> > : builds internal caches for sorting and I suspect that this is the root
>> > : of your problem. You can test this by trying your problem queries
>> > : without sorting.
>> >
>> > if Sorting really is the cause of your problems, you may want to try out
>> > this patch...
>> >
>> > https://issues.apache.org/jira/browse/LUCENE-769
>> >
>> > ...it *may* be advantageous in situations where memory is your most
>> > constrained resource, and you are willing to sacrifice speed for sorting
>> > ... it looks promising to me, but there haven't been any convincing
>> > usecases/benchmarks of people finding it beneficial (other then the
>> > original contributor)
>> >
>> > if you do try it, please post your comments in the issue.
>> >
>> >
>> >
>> > -Hoss
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>>

-- 
Best regards,
 Artem                            mailto:abublic@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Re[2]: Out of memory exception for big indexes

Posted by Erick Erickson <er...@gmail.com>.

It *is* a bit confusing, since every search is sorted, kinda....

Practically, a sorted query is one where you call one of the search
methods (on, say, Searcher) with a Sort object, which sorts
on one or more of the fields in your index (which ones are
used are specified in the (array of) Sort objects).

Searches that do NOT have a Sort object default to using
relevance ranking, which is not nearly so memory-intensive. This is,
after all, one float or so....

The difference is that the fields referenced in the Sort object
have to be read into memory and compared against all other
values, and the aggregate may be quite large memory-wise.

Erick

On 4/8/07, Nilesh Bansal <ni...@gmail.com> wrote:
>
> On 4/8/07, Artem <ab...@gmail.com> wrote:
> > I must note that my patch only helps in lucene-OOM situations related to
> > _sorted_ queries. If this is your case than I think yes it will help.
> Probably a newbie question, but can you please explain what sorted
> queries mean? Is simple keyword search a sorted query?
>
> > In my app currently index is not so big, only 1mln docs. With the patch
> applied
> > sample query giving first 30 of 120,000 sorted results made memory
> consumption
> > jump from 18M to 20M according to jconsole.
> >
> > NB> It seems that there are some issues with this patch and that was the
> > NB> reason it is not yet in the main source tree. Can someone please
> > NB> summerize what are the downsides of using such an approach. It will
> be
> > NB> really good if Lucene had it in main source tree and a flag to turn
> ON
> > NB> or OFF this feature.
> >
> > First there's performance cost (for second and further queries with the
> > same IndexSearcher). In default implementation all the index values of
> sorted
> > field are cached during the first sorted search - this takes memory and
> time;
> > but next queries run fast if there still some memory left. My
> implementation
> > doesn't cache field values but loads them from respective documents on
> the fly -
> > so it's slower but takes less memory. The query mentioned took about 3s
> (with
> > rather small sorted fields values - about 20-100 chars).
> > There's a limitation also - my implementation requires sorted field to
> be
> > "stored" in index (Field.Store.YES in doc.add())
> >
> > NB> Bublic, can you tell me what exactly I need to do if I want to use
> this patch?
> >
> > You can include StoredFieldSortFactory class source file into your
> sources and
> > then use StoredFieldSortFactory.create(sortFieldName, sortDescending) to
> get
> > Sort object for sorting query.
> > StoredFieldSortFactory source file can be extracted from LUCENE-769
> patch or
> > from sharehound sources:
> http://sharehound.cvs.sourceforge.net/*checkout*/sharehound/jNetCrawler/src/java/org/apache/lucene/search/StoredFieldSortFactory.java
> >
> > Regards,
> > Artem
> >
> > NB> thanks
> > NB> Nilesh
> >
> > NB> On 4/6/07, Bublic Online <ab...@gmail.com> wrote:
> > >> Hi Ivan, Chris and all!
> > >>
> > >> I'm that contributor of LUCENE-769 and I recommend it too :)
> > >> OutOfMemory error was one of main reasons for me to make it.
> > >>
> > >> Regards,
> > >> Artem Vasiliev
> > >>
> > >> On 4/6/07, Chris Hostetter <ho...@fucit.org> wrote:
> > >> >
> > >> >
> > >> > : The problem I suspect is the sorting. As I understand, Lucene
> > >> > : builds internal caches for sorting and I suspect that this is the
> root
> > >> > : of your problem. You can test this by trying your problem queries
> > >> > : without sorting.
> > >> >
> > >> > if Sorting really is the cause of your problems, you may want to
> try out
> > >> > this patch...
> > >> >
> > >> > https://issues.apache.org/jira/browse/LUCENE-769
> > >> >
> > >> > ...it *may* be advantageous in situations where memory is your most
> > >> > constrained resource, and you are willing to sacrifice speed for
> sorting
> > >> > ... it looks promising to me, but there haven't been any convincing
> > >> > usecases/benchmarks of people finding it beneficial (other then the
> > >> > original contributor)
> > >> >
> > >> > if you do try it, please post your comments in the issue.
> > >> >
> > >> >
> > >> >
> > >> > -Hoss
> > >> >
> > >> >
> > >> >
> ---------------------------------------------------------------------
> > >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >> >
> > >> >
> > >>
> >
> >
> >
> >
> >
> > --
> > Best regards,
> >  Artem                            mailto:abublic@gmail.com
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
> Nilesh Bansal.
> http://queens.db.toronto.edu/~nilesh/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re[4]: Out of memory exception for big indexes

Posted by Artem <ab...@gmail.com>.

Hello Nilesh,

Sunday, April 8, 2007, 10:58:32 PM, you wrote:

[talkin' about LUCENE-769]

>> I must note that my patch only helps in lucene-OOM situations related to
>> _sorted_ queries. If this is your case than I think yes it will help.
NB> Probably a newbie question, but can you please explain what sorted
NB> queries mean? Is simple keyword search a sorted query?

That's simple - if results presented on screen sorted by that keyword it's
sorted query :)
Another test is your system's code. Sorted queries I mean are calls to
IndexSearcher.search(query, sort).

-- 
Best regards,
 Artem                            mailto:abublic@gmail.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Re[2]: Out of memory exception for big indexes

Posted by Nilesh Bansal <ni...@gmail.com>.

On 4/8/07, Artem <ab...@gmail.com> wrote:
> I must note that my patch only helps in lucene-OOM situations related to
> _sorted_ queries. If this is your case than I think yes it will help.
Probably a newbie question, but can you please explain what sorted
queries mean? Is simple keyword search a sorted query?

> In my app currently index is not so big, only 1mln docs. With the patch applied
> sample query giving first 30 of 120,000 sorted results made memory consumption
> jump from 18M to 20M according to jconsole.
>
> NB> It seems that there are some issues with this patch and that was the
> NB> reason it is not yet in the main source tree. Can someone please
> NB> summerize what are the downsides of using such an approach. It will be
> NB> really good if Lucene had it in main source tree and a flag to turn ON
> NB> or OFF this feature.
>
> First there's performance cost (for second and further queries with the
> same IndexSearcher). In default implementation all the index values of sorted
> field are cached during the first sorted search - this takes memory and time;
> but next queries run fast if there still some memory left. My implementation
> doesn't cache field values but loads them from respective documents on the fly -
> so it's slower but takes less memory. The query mentioned took about 3s (with
> rather small sorted fields values - about 20-100 chars).
> There's a limitation also - my implementation requires sorted field to be
> "stored" in index (Field.Store.YES in doc.add())
>
> NB> Bublic, can you tell me what exactly I need to do if I want to use this patch?
>
> You can include StoredFieldSortFactory class source file into your sources and
> then use StoredFieldSortFactory.create(sortFieldName, sortDescending) to get
> Sort object for sorting query.
> StoredFieldSortFactory source file can be extracted from LUCENE-769 patch or
> from sharehound sources: http://sharehound.cvs.sourceforge.net/*checkout*/sharehound/jNetCrawler/src/java/org/apache/lucene/search/StoredFieldSortFactory.java
>
> Regards,
> Artem
>
> NB> thanks
> NB> Nilesh
>
> NB> On 4/6/07, Bublic Online <ab...@gmail.com> wrote:
> >> Hi Ivan, Chris and all!
> >>
> >> I'm that contributor of LUCENE-769 and I recommend it too :)
> >> OutOfMemory error was one of main reasons for me to make it.
> >>
> >> Regards,
> >> Artem Vasiliev
> >>
> >> On 4/6/07, Chris Hostetter <ho...@fucit.org> wrote:
> >> >
> >> >
> >> > : The problem I suspect is the sorting. As I understand, Lucene
> >> > : builds internal caches for sorting and I suspect that this is the root
> >> > : of your problem. You can test this by trying your problem queries
> >> > : without sorting.
> >> >
> >> > if Sorting really is the cause of your problems, you may want to try out
> >> > this patch...
> >> >
> >> > https://issues.apache.org/jira/browse/LUCENE-769
> >> >
> >> > ...it *may* be advantageous in situations where memory is your most
> >> > constrained resource, and you are willing to sacrifice speed for sorting
> >> > ... it looks promising to me, but there haven't been any convincing
> >> > usecases/benchmarks of people finding it beneficial (other then the
> >> > original contributor)
> >> >
> >> > if you do try it, please post your comments in the issue.
> >> >
> >> >
> >> >
> >> > -Hoss
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >
> >> >
> >>
>
>
>
>
>
> --
> Best regards,
>  Artem                            mailto:abublic@gmail.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Nilesh Bansal.
http://queens.db.toronto.edu/~nilesh/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re[2]: Out of memory exception for big indexes

Posted by Artem <ab...@gmail.com>.

Hello Ivan,

That was cool news! Thanks! :) The timings are surprisingly good. 10 mln docs
sorted in 20s.. cool! Also it looks like sorting algorithm employed by Lucene is
quite memory-economic.

Not supporting multiple fields is in fact another limitation of my patch. I
don't need it so I didn't implement it :) What is needed to implement it is
probably do it manually - employ FieldSelector fetching that bunch of fields;
change compare(ScoreDoc scoreDoc1, ScoreDoc scoreDoc2) method so that it
compares docs by a bunch of fields (there should be also another array of
Asc/Desc flags somewhere which makes this more complicated) instead of single
field; that's it.

I don't understand yet why Sort(SortField[] fields) didn't give the same when
fields.length == 1.. Probably we should dig into Lucene code to find out.
In case of several fields I can imagine why this approach would be less effective: at least
N*2 Document reads (by StoredFieldComparator.sortValue) will be needed to
compare 2 documents (N is length of fields array).
One read with appropriate FieldSelector is likely to perform better.

Anyway, I do think StoredFieldSortFactory's approach could be successfully
applied to multiple fields, but I'm not going to implement it yet. May be you?
:)

Regards,
Artem

IV> Hi Artem,

IV> Thank you very much for your mails :)
IV> So first I have to tell you that your patch works perfectly even with 
IV> very big indexes - 40 GB (you can see the results bellow).
IV> The reason I to have bad test results last time is that I made a bit 
IV> change (but I can not understand why this change made problem - on my 
IV> opinion it should not have so big effects on performance).
IV> So the change that I made is - I added a new method in the class 
IV> StoredFieldSortFactory. It is the same like create(String sortFieldName, 
IV> boolean sortDescending) method but instead of wrapping SortField it 
IV> return it directly and in my class I wrap this object in a Sort one. 
IV> Here is the code:

IV> public static SortField createSortField(String sortFieldName, boolean 
IV> sortDescending) {
IV> return new SortField(sortFieldName, instance, sortDescending);
IV> }

IV> I do this because we have to support sorting on multiple fields and I 
IV> obtain all SortField objects in a cycle and then create Sort out of them:

IV> Sort sort = new Sort(sortFields);

IV> In my tests that were with very bad results (time for searches was more 
IV> than 5 mins) in all the tests I used sorting ONLY BY ONE FIELD (means 
IV> the array sortFields was always with length 1).
IV> But I still used the constructor Sort(SortField[]) but not 
IV> Sort(SortField) as originally in your code in the method 
IV> StoredFieldSortFactory.create(..).
IV> Do you think this is the reason for pure performance?

IV> If so, COULD YOU PLEASE TELL ME how to use your patch for sorting on 
IV> multiple stored fields?

IV> Here are the test result of your patch with different indexes (the tests 
IV> are with code just as you recommend to use it - with using of your 
IV> create(..) method that uses constructor Sort(SortField) ):

IV> - CPU - Intel Core2Duo, max memory allowed to the process that makes 
IV> searching - 1GB (not all of it used)
IV> **********************************************************************************************************
IV> - index size 3,3 GB, about 486 410 documents (all the testing searches 
IV> include all documents);

IV> ____________________________________________________________________________________________

IV> - field size - it is file name and varies - on my opinion 15 - 30 chars 
IV> average.
IV> - search time (ASC) - 1,312 s, memory usage - 71MB
IV> - search time (DSC) - 1,281 s, memory usage - 71MB

IV> - field size - it is abs path name and varies - on my opinion 60 - 90 
IV> chars average.
IV> - search time (ASC) - 1,344 s, memory usage - 71MB
IV> - search time (DSC) - 1,328 s, memory usage - 71MB

IV> - field size - it is file size and varies - on my opinion 3 - 7 chars 
IV> average.
IV> - search time (ASC) - 1,313 s, memory usage - 71MB
IV> - search time (DSC) - 1,312 s, memory usage - 71MB

IV> **********************************************************************************

IV> - index size 21,4 GB, about 376 999 documents (all the testing searches 
IV> include all documents);
IV> ____________________________________________________________________________________________

IV> - field size - it is file name and varies - on my opinion 15 - 30 chars 
IV> average.
IV> - search time (ASC) - 0,875 s, memory usage - 371MB
IV> - search time (DSC) - 0,828 s, memory usage - 371MB

IV> - field size - it is abs path name and varies - on my opinion 60 - 90 
IV> chars average.
IV> - search time (ASC) - 0,844 s, memory usage - 371MB
IV> - search time (DSC) - 0,813 s, memory usage - 371MB

IV> - field size - it is file size and varies - on my opinion 3 - 7 chars 
IV> average.
IV> - search time (ASC) - 0,813 s, memory usage - 371MB
IV> - search time (DSC) - 0,797 s, memory usage - 371MB

IV> **********************************************************************************

IV> - index size 42,9 GB, about 10 944 918 documents (all the testing 
IV> searches include all documents);
IV> ____________________________________________________________________________________________

IV> - field size - it is file name and varies - on my opinion 15 - 30 chars 
IV> average.
IV> - search time (ASC) - 21,905 s, memory usage - 625MB
IV> - search time (DSC) - 21,781 s, memory usage - 625MB

IV> - field size - it is abs path name and varies - on my opinion 60 - 90 
IV> chars average.
IV> - search time (ASC) - 21,874 s, memory usage - 625MB
IV> - search time (DSC) - 21,749 s, memory usage - 625MB

IV> - field size - it is file size and varies - on my opinion 3 - 7 chars 
IV> average.
IV> - search time (ASC) - 21,687 s, memory usage - 625MB
IV> - search time (DSC) - 21,812 s, memory usage - 625MB


IV> THANK YOU VERY MUCH,
IV> Ivan





-- 
Best regards,
 Artem                            mailto:abublic@gmail.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Out of memory exception for big indexes

Posted by Ivan Vasilev <iv...@sirma.bg>.

Hi Artem,

Thank you very much for your mails :)
So first I have to tell you that your patch works perfectly even with 
very big indexes - 40 GB (you can see the results bellow).
The reason I to have bad test results last time is that I made a bit 
change (but I can not understand why this change made problem - on my 
opinion it should not have so big effects on performance).
So the change that I made is - I added a new method in the class 
StoredFieldSortFactory. It is the same like create(String sortFieldName, 
boolean sortDescending) method but instead of wrapping SortField it 
return it directly and in my class I wrap this object in a Sort one. 
Here is the code:

public static SortField createSortField(String sortFieldName, boolean 
sortDescending) {
return new SortField(sortFieldName, instance, sortDescending);
}

I do this because we have to support sorting on multiple fields and I 
obtain all SortField objects in a cycle and then create Sort out of them:

Sort sort = new Sort(sortFields);

In my tests that were with very bad results (time for searches was more 
than 5 mins) in all the tests I used sorting ONLY BY ONE FIELD (means 
the array sortFields was always with length 1).
But I still used the constructor Sort(SortField[]) but not 
Sort(SortField) as originally in your code in the method 
StoredFieldSortFactory.create(..).
Do you think this is the reason for pure performance?

If so, COULD YOU PLEASE TELL ME how to use your patch for sorting on 
multiple stored fields?

Here are the test result of your patch with different indexes (the tests 
are with code just as you recommend to use it - with using of your 
create(..) method that uses constructor Sort(SortField) ):

- CPU - Intel Core2Duo, max memory allowed to the process that makes 
searching - 1GB (not all of it used)
**********************************************************************************************************
- index size 3,3 GB, about 486 410 documents (all the testing searches 
include all documents);

____________________________________________________________________________________________

- field size - it is file name and varies - on my opinion 15 - 30 chars 
average.
- search time (ASC) - 1,312 s, memory usage - 71MB
- search time (DSC) - 1,281 s, memory usage - 71MB

- field size - it is abs path name and varies - on my opinion 60 - 90 
chars average.
- search time (ASC) - 1,344 s, memory usage - 71MB
- search time (DSC) - 1,328 s, memory usage - 71MB

- field size - it is file size and varies - on my opinion 3 - 7 chars 
average.
- search time (ASC) - 1,313 s, memory usage - 71MB
- search time (DSC) - 1,312 s, memory usage - 71MB

**********************************************************************************

- index size 21,4 GB, about 376 999 documents (all the testing searches 
include all documents);
____________________________________________________________________________________________

- field size - it is file name and varies - on my opinion 15 - 30 chars 
average.
- search time (ASC) - 0,875 s, memory usage - 371MB
- search time (DSC) - 0,828 s, memory usage - 371MB

- field size - it is abs path name and varies - on my opinion 60 - 90 
chars average.
- search time (ASC) - 0,844 s, memory usage - 371MB
- search time (DSC) - 0,813 s, memory usage - 371MB

- field size - it is file size and varies - on my opinion 3 - 7 chars 
average.
- search time (ASC) - 0,813 s, memory usage - 371MB
- search time (DSC) - 0,797 s, memory usage - 371MB

**********************************************************************************

- index size 42,9 GB, about 10 944 918 documents (all the testing 
searches include all documents);
____________________________________________________________________________________________

- field size - it is file name and varies - on my opinion 15 - 30 chars 
average.
- search time (ASC) - 21,905 s, memory usage - 625MB
- search time (DSC) - 21,781 s, memory usage - 625MB

- field size - it is abs path name and varies - on my opinion 60 - 90 
chars average.
- search time (ASC) - 21,874 s, memory usage - 625MB
- search time (DSC) - 21,749 s, memory usage - 625MB

- field size - it is file size and varies - on my opinion 3 - 7 chars 
average.
- search time (ASC) - 21,687 s, memory usage - 625MB
- search time (DSC) - 21,812 s, memory usage - 625MB


THANK YOU VERY MUCH,
Ivan




Artem Vasiliev wrote:
> Hello Ivan!
>
> It's so sad to me that you had bad results with that patch. :)
>
> The discussion in the ticket is out-of-date - the patch was initially in
> several classes, used WeakHashMap but then it evolved to what it's now 
> - one
> StoredFieldSortFactory class. I use it in my sharehound app in pretty 
> much
> the same the form it is in Jira currently and it does show good 
> results to
> me.
>
> In your sample searches,
> - how many results do you have?
> - how long does the sorted search execute?
> - what is the average size of a sorted field?
> - what is the CPU and how much of it and memory you give to the 
> application?
>
> I get page 1 (first 100 items) of sorted list with 10000 items in 0.3s 
> to 3s
> (for date column it exactly depends on whether the sort is ascending or
> descending - don't know why is that). My index is about 1mln docs and 1G;
> sorted fields are rather small (numbers, dates and string of maybe 50
> symbols average). The machine looks quite beefy to me - Intel core duo 
> with
> 500M given to the application.
>
> Regards,
> Artem
>
> On 4/23/07, Ivan Vasilev <iv...@sirma.bg> wrote:
>>
>> Hi All,
>> THANK YOU FOR YOUR HELP :)
>> I put this problem in the forum but I had no chance to work on it last
>> week unfurtunately...
>> So now I tested the Artem's patch but the results show:
>> 1) speed is very slow compare with the usage without patch
>> 2) There are not very big differences of memory usage (I tested till now
>> only with relativly small indexes - less than 1 GB and less than 1 mil
>> docs because the when using with 20-40 GB indexes I had to wait more
>> than 5 mins what is practically usless).
>>
>> So I have doubts if I use the patch correctly. I do just what is
>> described in Artem's letter:
>>
>> AV> You can include StoredFieldSortFactory class source file into your
>> sources and
>> AV> then use StoredFieldSortFactory.create(sortFieldName, 
>> sortDescending)
>> to get
>> AV> Sort object for sorting query.
>> AV> StoredFieldSortFactory source file can be extracted from LUCENE-769
>> patch or
>> AV> from sharehound sources:
>> http://sharehound.cvs.sourceforge.net/*checkout*/sharehound/jNetCrawler/src/java/org/apache/lucene/search/StoredFieldSortFactory.java 
>>
>>
>>
>> What I am wondering about is that in the patch commetns
>> (https://issues.apache.org/jira/browse/LUCENE-769) I see that there is
>> written that patch solves the problem by using WeakHashMap, but actually
>> in the downloaded StoredFieldSortFactory.java  file there is not used
>> WeakHashMap. Another thing: In the comments in Lucene-769 issue there is
>> mentioned something about classes like: WeakDocumentsCache and
>> DocCachingIndexReader but I did not found them in Lucene source code
>> neither as classes in StoredFieldSortFactory.java. So my questions are:
>> 1. Is it enought to include the file StoredFieldSortFactory.java in the
>> source code or there are also other classes that I have to douwnload and
>> include?
>> 2. Have I to use this DocCachingIndexReader instead of Reader that I
>> currently use in cases when I expect OOMException and will use this 
>> patch?
>>
>> Thanks to all once again :),
>> Ivan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Out of memory exception for big indexes

Posted by Artem Vasiliev <ab...@gmail.com>.

Hi Ivan!

btw may be forbidding the sorted search in case of too many results is an
option? I did this way in my case.

Regards,
Artem.

On 4/24/07, Artem Vasiliev <ab...@gmail.com> wrote:
>
> Ahhh, you said in your original post that your search matches _all_ the
> results.. Yup my patch will not help much in this case - after all all the
> values have to be read to be compared while sorting! :)
>
> LUCENE-769 patch helps only if result set is significantly less than full
> index size.
>
> Regards,
> Artem
>
> On 4/24/07, Artem Vasiliev <ab...@gmail.com> wrote:
> >
> > Hello Ivan!
> >
> > It's so sad to me that you had bad results with that patch. :)
> >
> > The discussion in the ticket is out-of-date - the patch was initially in
> > several classes, used WeakHashMap but then it evolved to what it's now - one
> > StoredFieldSortFactory class. I use it in my sharehound app in pretty much
> > the same the form it is in Jira currently and it does show good results to
> > me.
> >
> > In your sample searches,
> > - how many results do you have?
> > - how long does the sorted search execute?
> > - what is the average size of a sorted field?
> > - what is the CPU and how much of it and memory you give to the
> > application?
> >
> > I get page 1 (first 100 items) of sorted list with 10000 items in 0.3sto 3s (for date column it exactly depends on whether the sort is ascending
> > or descending - don't know why is that). My index is about 1mln docs and 1G;
> > sorted fields are rather small (numbers, dates and string of maybe 50
> > symbols average). The machine looks quite beefy to me - Intel core duo with
> > 500M given to the application.
> >
> > Regards,
> > Artem
> >
> > On 4/23/07, Ivan Vasilev < ivasilev@sirma.bg> wrote:
> > >
> > > Hi All,
> > > THANK YOU FOR YOUR HELP :)
> > > I put this problem in the forum but I had no chance to work on it last
> > > week unfurtunately...
> > > So now I tested the Artem's patch but the results show:
> > > 1) speed is very slow compare with the usage without patch
> > > 2) There are not very big differences of memory usage (I tested till
> > > now
> > > only with relativly small indexes - less than 1 GB and less than 1 mil
> > > docs because the when using with 20-40 GB indexes I had to wait more
> > > than 5 mins what is practically usless).
> > >
> > > So I have doubts if I use the patch correctly. I do just what is
> > > described in Artem's letter:
> > >
> > > AV> You can include StoredFieldSortFactory class source file into your
> > > sources and
> > > AV> then use StoredFieldSortFactory.create(sortFieldName,
> > > sortDescending) to get
> > > AV> Sort object for sorting query.
> > > AV> StoredFieldSortFactory source file can be extracted from
> > > LUCENE-769 patch or
> > > AV> from sharehound sources: http://sharehound.cvs.sourceforge.net/*checkout*/sharehound/jNetCrawler/src/java/org/apache/lucene/search/StoredFieldSortFactory.java
> > >
> > >
> > >
> > > What I am wondering about is that in the patch commetns
> > > (https://issues.apache.org/jira/browse/LUCENE-769 ) I see that there
> > > is
> > > written that patch solves the problem by using WeakHashMap, but
> > > actually
> > > in the downloaded StoredFieldSortFactory.java  file there is not used
> > > WeakHashMap. Another thing: In the comments in Lucene-769 issue there
> > > is
> > > mentioned something about classes like: WeakDocumentsCache and
> > > DocCachingIndexReader but I did not found them in Lucene source code
> > > neither as classes in StoredFieldSortFactory.java. So my questions
> > > are:
> > > 1. Is it enought to include the file StoredFieldSortFactory.java in
> > > the
> > > source code or there are also other classes that I have to douwnload
> > > and
> > > include?
> > > 2. Have I to use this DocCachingIndexReader instead of Reader that I
> > > currently use in cases when I expect OOMException and will use this
> > > patch?
> > >
> > > Thanks to all once again :),
> > > Ivan
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>

Re: Out of memory exception for big indexes

Posted by Artem Vasiliev <ab...@gmail.com>.

Ahhh, you said in your original post that your search matches _all_ the
results.. Yup my patch will not help much in this case - after all all the
values have to be read to be compared while sorting! :)

LUCENE-769 patch helps only if result set is significantly less than full
index size.

Regards,
Artem

On 4/24/07, Artem Vasiliev <ab...@gmail.com> wrote:
>
> Hello Ivan!
>
> It's so sad to me that you had bad results with that patch. :)
>
> The discussion in the ticket is out-of-date - the patch was initially in
> several classes, used WeakHashMap but then it evolved to what it's now - one
> StoredFieldSortFactory class. I use it in my sharehound app in pretty much
> the same the form it is in Jira currently and it does show good results to
> me.
>
> In your sample searches,
> - how many results do you have?
> - how long does the sorted search execute?
> - what is the average size of a sorted field?
> - what is the CPU and how much of it and memory you give to the
> application?
>
> I get page 1 (first 100 items) of sorted list with 10000 items in 0.3s to
> 3s (for date column it exactly depends on whether the sort is ascending or
> descending - don't know why is that). My index is about 1mln docs and 1G;
> sorted fields are rather small (numbers, dates and string of maybe 50
> symbols average). The machine looks quite beefy to me - Intel core duo with
> 500M given to the application.
>
> Regards,
> Artem
>
> On 4/23/07, Ivan Vasilev <iv...@sirma.bg> wrote:
> >
> > Hi All,
> > THANK YOU FOR YOUR HELP :)
> > I put this problem in the forum but I had no chance to work on it last
> > week unfurtunately...
> > So now I tested the Artem's patch but the results show:
> > 1) speed is very slow compare with the usage without patch
> > 2) There are not very big differences of memory usage (I tested till now
> > only with relativly small indexes - less than 1 GB and less than 1 mil
> > docs because the when using with 20-40 GB indexes I had to wait more
> > than 5 mins what is practically usless).
> >
> > So I have doubts if I use the patch correctly. I do just what is
> > described in Artem's letter:
> >
> > AV> You can include StoredFieldSortFactory class source file into your
> > sources and
> > AV> then use StoredFieldSortFactory.create(sortFieldName,
> > sortDescending) to get
> > AV> Sort object for sorting query.
> > AV> StoredFieldSortFactory source file can be extracted from LUCENE-769
> > patch or
> > AV> from sharehound sources: http://sharehound.cvs.sourceforge.net/*checkout*/sharehound/jNetCrawler/src/java/org/apache/lucene/search/StoredFieldSortFactory.java
> >
> >
> >
> > What I am wondering about is that in the patch commetns
> > (https://issues.apache.org/jira/browse/LUCENE-769) I see that there is
> > written that patch solves the problem by using WeakHashMap, but actually
> >
> > in the downloaded StoredFieldSortFactory.java  file there is not used
> > WeakHashMap. Another thing: In the comments in Lucene-769 issue there is
> > mentioned something about classes like: WeakDocumentsCache and
> > DocCachingIndexReader but I did not found them in Lucene source code
> > neither as classes in StoredFieldSortFactory.java. So my questions are:
> > 1. Is it enought to include the file StoredFieldSortFactory.java in the
> > source code or there are also other classes that I have to douwnload and
> >
> > include?
> > 2. Have I to use this DocCachingIndexReader instead of Reader that I
> > currently use in cases when I expect OOMException and will use this
> > patch?
> >
> > Thanks to all once again :),
> > Ivan
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Re: Out of memory exception for big indexes

Posted by Artem Vasiliev <ab...@gmail.com>.

Hello Ivan!

It's so sad to me that you had bad results with that patch. :)

The discussion in the ticket is out-of-date - the patch was initially in
several classes, used WeakHashMap but then it evolved to what it's now - one
StoredFieldSortFactory class. I use it in my sharehound app in pretty much
the same the form it is in Jira currently and it does show good results to
me.

In your sample searches,
- how many results do you have?
- how long does the sorted search execute?
- what is the average size of a sorted field?
- what is the CPU and how much of it and memory you give to the application?

I get page 1 (first 100 items) of sorted list with 10000 items in 0.3s to 3s
(for date column it exactly depends on whether the sort is ascending or
descending - don't know why is that). My index is about 1mln docs and 1G;
sorted fields are rather small (numbers, dates and string of maybe 50
symbols average). The machine looks quite beefy to me - Intel core duo with
500M given to the application.

Regards,
Artem

On 4/23/07, Ivan Vasilev <iv...@sirma.bg> wrote:
>
> Hi All,
> THANK YOU FOR YOUR HELP :)
> I put this problem in the forum but I had no chance to work on it last
> week unfurtunately...
> So now I tested the Artem's patch but the results show:
> 1) speed is very slow compare with the usage without patch
> 2) There are not very big differences of memory usage (I tested till now
> only with relativly small indexes - less than 1 GB and less than 1 mil
> docs because the when using with 20-40 GB indexes I had to wait more
> than 5 mins what is practically usless).
>
> So I have doubts if I use the patch correctly. I do just what is
> described in Artem's letter:
>
> AV> You can include StoredFieldSortFactory class source file into your
> sources and
> AV> then use StoredFieldSortFactory.create(sortFieldName, sortDescending)
> to get
> AV> Sort object for sorting query.
> AV> StoredFieldSortFactory source file can be extracted from LUCENE-769
> patch or
> AV> from sharehound sources:
> http://sharehound.cvs.sourceforge.net/*checkout*/sharehound/jNetCrawler/src/java/org/apache/lucene/search/StoredFieldSortFactory.java
>
>
> What I am wondering about is that in the patch commetns
> (https://issues.apache.org/jira/browse/LUCENE-769) I see that there is
> written that patch solves the problem by using WeakHashMap, but actually
> in the downloaded StoredFieldSortFactory.java  file there is not used
> WeakHashMap. Another thing: In the comments in Lucene-769 issue there is
> mentioned something about classes like: WeakDocumentsCache and
> DocCachingIndexReader but I did not found them in Lucene source code
> neither as classes in StoredFieldSortFactory.java. So my questions are:
> 1. Is it enought to include the file StoredFieldSortFactory.java in the
> source code or there are also other classes that I have to douwnload and
> include?
> 2. Have I to use this DocCachingIndexReader instead of Reader that I
> currently use in cases when I expect OOMException and will use this patch?
>
> Thanks to all once again :),
> Ivan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Out of memory exception for big indexes

Posted by Ivan Vasilev <iv...@sirma.bg>.

Hi All,
THANK YOU FOR YOUR HELP :)
I put this problem in the forum but I had no chance to work on it last 
week unfurtunately...
So now I tested the Artem's patch but the results show:
1) speed is very slow compare with the usage without patch
2) There are not very big differences of memory usage (I tested till now 
only with relativly small indexes - less than 1 GB and less than 1 mil 
docs because the when using with 20-40 GB indexes I had to wait more 
than 5 mins what is practically usless).

So I have doubts if I use the patch correctly. I do just what is 
described in Artem's letter:

AV> You can include StoredFieldSortFactory class source file into your sources and
AV> then use StoredFieldSortFactory.create(sortFieldName, sortDescending) to get
AV> Sort object for sorting query.
AV> StoredFieldSortFactory source file can be extracted from LUCENE-769 patch or
AV> from sharehound sources: http://sharehound.cvs.sourceforge.net/*checkout*/sharehound/jNetCrawler/src/java/org/apache/lucene/search/StoredFieldSortFactory.java


What I am wondering about is that in the patch commetns 
(https://issues.apache.org/jira/browse/LUCENE-769) I see that there is 
written that patch solves the problem by using WeakHashMap, but actually 
in the downloaded StoredFieldSortFactory.java  file there is not used 
WeakHashMap. Another thing: In the comments in Lucene-769 issue there is 
mentioned something about classes like: WeakDocumentsCache and 
DocCachingIndexReader but I did not found them in Lucene source code 
neither as classes in StoredFieldSortFactory.java. So my questions are:
1. Is it enought to include the file StoredFieldSortFactory.java in the 
source code or there are also other classes that I have to douwnload and 
include?
2. Have I to use this DocCachingIndexReader instead of Reader that I 
currently use in cases when I expect OOMException and will use this patch?

Thanks to all once again :),
Ivan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re[2]: Out of memory exception for big indexes

Posted by Artem <ab...@gmail.com>.

Hello Nilesh,

Sunday, April 8, 2007, 9:03:06 AM, you wrote:

NB> This seems like a very useful patch. Our application searches over 50
NB> million doc in a 40GB index. We only have simple conjunctive queries
NB> on a single field. Currently, the command line search program that
NB> prints top-10 results requires at least 200mb memory. Our web
NB> application, that searches the same index crashes with OOM when there
NB> are more than 10-12 concurrent requests (heap size set to 3GB). Will
NB> this patch help in such a situation?

I must note that my patch only helps in lucene-OOM situations related to
_sorted_ queries. If this is your case than I think yes it will help.

In my app currently index is not so big, only 1mln docs. With the patch applied
sample query giving first 30 of 120,000 sorted results made memory consumption
jump from 18M to 20M according to jconsole.

NB> It seems that there are some issues with this patch and that was the
NB> reason it is not yet in the main source tree. Can someone please
NB> summerize what are the downsides of using such an approach. It will be
NB> really good if Lucene had it in main source tree and a flag to turn ON
NB> or OFF this feature.

First there's performance cost (for second and further queries with the
same IndexSearcher). In default implementation all the index values of sorted
field are cached during the first sorted search - this takes memory and time;
but next queries run fast if there still some memory left. My implementation
doesn't cache field values but loads them from respective documents on the fly -
so it's slower but takes less memory. The query mentioned took about 3s (with
rather small sorted fields values - about 20-100 chars).
There's a limitation also - my implementation requires sorted field to be
"stored" in index (Field.Store.YES in doc.add())

NB> Bublic, can you tell me what exactly I need to do if I want to use this patch?

You can include StoredFieldSortFactory class source file into your sources and
then use StoredFieldSortFactory.create(sortFieldName, sortDescending) to get
Sort object for sorting query.
StoredFieldSortFactory source file can be extracted from LUCENE-769 patch or
from sharehound sources: http://sharehound.cvs.sourceforge.net/*checkout*/sharehound/jNetCrawler/src/java/org/apache/lucene/search/StoredFieldSortFactory.java

Regards,
Artem

NB> thanks
NB> Nilesh

NB> On 4/6/07, Bublic Online <ab...@gmail.com> wrote:
>> Hi Ivan, Chris and all!
>>
>> I'm that contributor of LUCENE-769 and I recommend it too :)
>> OutOfMemory error was one of main reasons for me to make it.
>>
>> Regards,
>> Artem Vasiliev
>>
>> On 4/6/07, Chris Hostetter <ho...@fucit.org> wrote:
>> >
>> >
>> > : The problem I suspect is the sorting. As I understand, Lucene
>> > : builds internal caches for sorting and I suspect that this is the root
>> > : of your problem. You can test this by trying your problem queries
>> > : without sorting.
>> >
>> > if Sorting really is the cause of your problems, you may want to try out
>> > this patch...
>> >
>> > https://issues.apache.org/jira/browse/LUCENE-769
>> >
>> > ...it *may* be advantageous in situations where memory is your most
>> > constrained resource, and you are willing to sacrifice speed for sorting
>> > ... it looks promising to me, but there haven't been any convincing
>> > usecases/benchmarks of people finding it beneficial (other then the
>> > original contributor)
>> >
>> > if you do try it, please post your comments in the issue.
>> >
>> >
>> >
>> > -Hoss
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>>

-- 
Best regards,
 Artem                            mailto:abublic@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Out of memory exception for big indexes

Posted by Nilesh Bansal <ni...@gmail.com>.

This seems like a very useful patch. Our application searches over 50
million doc in a 40GB index. We only have simple conjunctive queries
on a single field. Currently, the command line search program that
prints top-10 results requires at least 200mb memory. Our web
application, that searches the same index crashes with OOM when there
are more than 10-12 concurrent requests (heap size set to 3GB). Will
this patch help in such a situation?

It seems that there are some issues with this patch and that was the
reason it is not yet in the main source tree. Can someone please
summerize what are the downsides of using such an approach. It will be
really good if Lucene had it in main source tree and a flag to turn ON
or OFF this feature.

Bublic, can you tell me what exactly I need to do if I want to use this patch?

thanks
Nilesh

On 4/6/07, Bublic Online <ab...@gmail.com> wrote:
> Hi Ivan, Chris and all!
>
> I'm that contributor of LUCENE-769 and I recommend it too :)
> OutOfMemory error was one of main reasons for me to make it.
>
> Regards,
> Artem Vasiliev
>
> On 4/6/07, Chris Hostetter <ho...@fucit.org> wrote:
> >
> >
> > : The problem I suspect is the sorting. As I understand, Lucene
> > : builds internal caches for sorting and I suspect that this is the root
> > : of your problem. You can test this by trying your problem queries
> > : without sorting.
> >
> > if Sorting really is the cause of your problems, you may want to try out
> > this patch...
> >
> > https://issues.apache.org/jira/browse/LUCENE-769
> >
> > ...it *may* be advantageous in situations where memory is your most
> > constrained resource, and you are willing to sacrifice speed for sorting
> > ... it looks promising to me, but there haven't been any convincing
> > usecases/benchmarks of people finding it beneficial (other then the
> > original contributor)
> >
> > if you do try it, please post your comments in the issue.
> >
> >
> >
> > -Hoss
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

-- 
Nilesh Bansal.
http://queens.db.toronto.edu/~nilesh/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Out of memory exception for big indexes

Posted by Bublic Online <ab...@gmail.com>.

Hi Ivan, Chris and all!

I'm that contributor of LUCENE-769 and I recommend it too :)
OutOfMemory error was one of main reasons for me to make it.

Regards,
Artem Vasiliev

On 4/6/07, Chris Hostetter <ho...@fucit.org> wrote:
>
>
> : The problem I suspect is the sorting. As I understand, Lucene
> : builds internal caches for sorting and I suspect that this is the root
> : of your problem. You can test this by trying your problem queries
> : without sorting.
>
> if Sorting really is the cause of your problems, you may want to try out
> this patch...
>
> https://issues.apache.org/jira/browse/LUCENE-769
>
> ...it *may* be advantageous in situations where memory is your most
> constrained resource, and you are willing to sacrifice speed for sorting
> ... it looks promising to me, but there haven't been any convincing
> usecases/benchmarks of people finding it beneficial (other then the
> original contributor)
>
> if you do try it, please post your comments in the issue.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Out of memory exception for big indexes

Posted by Chris Hostetter <ho...@fucit.org>.

: The problem I suspect is the sorting. As I understand, Lucene
: builds internal caches for sorting and I suspect that this is the root
: of your problem. You can test this by trying your problem queries
: without sorting.

if Sorting really is the cause of your problems, you may want to try out
this patch...

https://issues.apache.org/jira/browse/LUCENE-769

...it *may* be advantageous in situations where memory is your most
constrained resource, and you are willing to sacrifice speed for sorting
... it looks promising to me, but there haven't been any convincing
usecases/benchmarks of people finding it beneficial (other then the
original contributor)

if you do try it, please post your comments in the issue.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Out of memory exception for big indexes

Posted by Erick Erickson <er...@gmail.com>.

I can only shed a little light on a couple of points, see below.

On 4/6/07, Ivan Vasilev <iv...@sirma.bg> wrote:
>
> Hi All,
>
> I have the following problem - we have OutOfMemoryException when
> seraching on the indexes that are of size 20 - 40 GB and contain 10 - 15
> million docs.
> When we make searches we perform query that match all the results but we
> DO NOT fetch all the results - we fetch 100 of them. We also make
> sorting by using the class Sort and we really need result to be sorted
> on a field that is randomly defined by the user.
> So my questions are:


The problem I suspect is the sorting. As I understand, Lucene
builds internal caches for sorting and I suspect that this is the root
of your problem. You can test this by trying your problem queries
without sorting.

How much memory are you giving the JVM?


1) Have Lucene some restrictions on index size on which it can perform
> searches?


No theoretical ones that I know of, but practical ones at times. As
you are finding.

2) Is there some approach to estimate beforehand the RAM that will use
> Lucene for sertain query? I mean on what exactly depends this memory
> usage - on index size, on docs stored in the index, on size of this
> docs...



I'd like to know this myself. Hint, hint, hint....

3) Is there some approach to controll the used RAM. For example when
> searching not to exceed 1GB of used memory?
> 4) Is there some spcial approach to proceeding with such big indexes (we
> expect in near future even 60 -80 GB indexes).
>
>
> Best Regards,
> Ivan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>