You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marcus Stratmann <st...@gmx.de> on 2006/06/14 12:25:45 UTC

OutOfMemory error while sorting

Hello,

I have a new problem with OutOfMemory errors.
As I reported before, we have an index with more than 10 million 
documents and 23 fields. Recently I added a new field which we will only 
use for sorting purposes (by "adding" I mean building a new index). But 
it turned out that every query using this field for sorting ends in an 
out of memory error. Even sorting result sets containing just one 
document does not work. The field is of type solr.StrField and strange 
enough there are some other fields in the index of the same type which 
do not cause these problems (but not all of them; our uniqueKey-field 
has the same problems with sorting).
Now I am wondering why sorting works with some of the fields but not 
with others. Could it be that this depends on the content?

Thanks,
Marcus

Re: OutOfMemory error while sorting

Posted by Chris Hostetter <ho...@fucit.org>.
: nearly 100 percent and no queries were answered. I found out that
: "warming" the server with serial queries, not parallel ones, bypassed
: this problem (not to be confused with warming the caches!). So after a

Note that you can have Solr do this automatically for you in both
firstSearcher and newSearcher listeners (so you never risk having one of
your users hit the searcher before your warming queries).  Take a look at
the commented out usage of QuerySenderListener in the example
solrconfig.xml...

    <listener event="newSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <lst> <str name="q">solr</str> <str name="start">0</str> <str
name="rows">10</str> </lst>
        <lst> <str name="q">rocks</str> <str name="start">0</str> <str
name="rows">10</str> </lst>
      </arr>
    </listener>


-Hoss


Re: OutOfMemory error while sorting

Posted by Marcus Stratmann <st...@gmx.de>.
Hi,

Chris Hostetter wrote:
> This is a fairly typical Lucene issue (ie: not specific to Solr)...
Ah, I see. I should really put more attention on Lucene. But when 
working with Solr I sometimes forget about the underlying technology.

> Sorting on a field requires building a FieldCache for every document --
> regardless of how many documents match your query.  This cache is reused
> for all searches thta sort on that field.
This makes things clear to me now. I always observed that Solr is slow 
after a commit or optimze. When I put a newly created or updated index 
into service the server always seemed to hang up. The CPU usage went to 
nearly 100 percent and no queries were answered. I found out that 
"warming" the server with serial queries, not parallel ones, bypassed 
this problem (not to be confused with warming the caches!). So after a 
commit I sent some hundred queries from our log to the server and this 
worked fine. But now I know I only need a few specific queries to do the 
job.

Thanks Chris for the great support! The Solr team is doing a very good 
job. With your help I finally got Solr running. Our system is live now 
and I will now switch over to the "Who uses Solr" thread to give you 
some feedback.

Again, thank you very much!

Marcus

Re: OutOfMemory error while sorting

Posted by Yonik Seeley <ys...@gmail.com>.
On 6/14/06, Chris Hostetter <ho...@fucit.org> wrote:
> Off the top of my head, i don't remember if omiting norms for fields
> reduces the amount of resident memory needed by the index

It does indeed.  1 byte per document for the indexed field.

-Yonik

Re: OutOfMemory error while sorting

Posted by Chris Hostetter <ho...@fucit.org>.
This is a fairly typical Lucene issue (ie: not specific to Solr)...

Sorting on a field requires building a FieldCache for every document --
regardless of how many documents match your query.  This cache is reused
for all searches thta sort on that field.

For things like Integers and Floats, the size of the FieldCache is one
item (int/float, etc) per document.  for Strings, the size is one int
per document, plus the total of every unique string field value.

This is why sorting on some String fields use more memory then other
String fields -- it all depends on hoe heterogenous the values in that
field are.  A field that only contains 4 unique values takes up a lot less
room then a field where every document has a different value.

In the end, there isn't much you can do about this except allocate more
memory to your JVM -- One option you do have in Solr is to tune other
parameters in Solr so that more of the memory you already have allocated
to the JVM is available for sorting.  (ie: making your filterCaches
smaller for example)

Off the top of my head, i don't remember if omiting norms for fields
reduces the amount of resident memory needed by the index, or just the on
disk size, but you might wnat to try that also if there are fields you
know you don't need norms for (a String field you sort on is a good bet,
since you probably don't search on it, and even if you do the length is
always going to be 1)


: I have a new problem with OutOfMemory errors.
: As I reported before, we have an index with more than 10 million
: documents and 23 fields. Recently I added a new field which we will only
: use for sorting purposes (by "adding" I mean building a new index). But
: it turned out that every query using this field for sorting ends in an
: out of memory error. Even sorting result sets containing just one
: document does not work. The field is of type solr.StrField and strange
: enough there are some other fields in the index of the same type which
: do not cause these problems (but not all of them; our uniqueKey-field
: has the same problems with sorting).
: Now I am wondering why sorting works with some of the fields but not
: with others. Could it be that this depends on the content?



-Hoss