You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Chris Laux <ch...@terraminds.com> on 2007/11/21 17:06:22 UTC

Memory use with sorting problem

Hi all,

I've been struggling with this problem for over a month now, and
although memory issues have been discussed often, I don't seem to be
able to find a fitting solution.

The index is merely 1.5 GB large, but memory use quickly fills out the
heap max of 1 GB on a 2 GB machine. This then works fine until
auto-warming starts. Switching the latter off altogether is unattractive
as it leads to response times of up to 30 s. When auto-warming starts, I
get this error:

> SEVERE: Error during auto-warming of
key:org.apache.solr.search.QueryResultKey
@e0b93139:java.lang.OutOfMemoryError: Java heap space

Now when I reduce the size of caches (to a fraction of the default
settings) and number of warming Searchers (to 2), memory use is not
reduced and the problem stays. Only deactivating auto-warming will help.
When I set the heap size limit higher (and go into swap space), all the
extra memory seems to be used up right away, independently from
auto-warming.

This all seems to be closely connected to sorting by a numerical field,
as switching this off does make memory use a lot more friendly.

Is it normal to need that much Memory for such a small index?

I suspect the problem is in Lucene, would it be better to post on their
list?

Does anyone know a better way of getting the sorting done?

Thanks in advance for your help,

Chris


This is the field setup in schema.xml:

<field name="id" type="long" stored="true" required="true"
multiValued="false" />
<field name="user-id" type="long" stored="true" required="true"
multiValued="false" />
<field name="text" type="text" indexed="true" multiValued="false" />
<field name="created" type="slong" indexed="true" multiValued="false" />

And this is a sample query:

select/?q=solr&start=0&rows=20&sort=created+desc

Re: Memory use with sorting problem

Posted by Chris Laux <ch...@terraminds.com>.

Just wanted to add the solution to this problem, in case someone finds
the matching description in the archives (see below).

By reducing the granularity of the timestamp field (stored as slong)
from seconds to minutes the number of unique values was reduced by an
order of magnitude (there are about 500.000 minutes in a year) and hence
the memory use was also reduced.

Chris


Chris Laux wrote:
> Hi again,
> 
> in the meantime I discovered the use of jmap (I'm not a Java programmer)
> and found that all the memory was being used up by String and char[]
> objects.
> 
> The Lucene docs have the following to say on sorting memory use:
> 
>> For String fields, the cache is larger: in addition to the above
> array, the value of every term in the field is kept in memory. If there
> are many unique terms in the field, this could be quite large.
> 
> (http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Sort.html)
> 
> I am sorting on the "slong" schema type, which is of course stored as a
> string. The above quote seems to indicate that it is possible for a
> field not to be a string for the purposes of the sort, while I took it
> from LiA that everything is a string to Lucene.
> 
> What can I do to make sure the additional memory is not used by every
> unique term? i.e. how to have the slong not be a "String field"?
> 
> Cheers,
> Chris
> 
> 
> Chris Laux wrote:
>> Hi all,
>>
>> I've been struggling with this problem for over a month now, and
>> although memory issues have been discussed often, I don't seem to be
>> able to find a fitting solution.
>>
>> The index is merely 1.5 GB large, but memory use quickly fills out the
>> heap max of 1 GB on a 2 GB machine. This then works fine until
>> auto-warming starts. Switching the latter off altogether is unattractive
>> as it leads to response times of up to 30 s. When auto-warming starts, I
>> get this error:
>>
>>> SEVERE: Error during auto-warming of
>> key:org.apache.solr.search.QueryResultKey
>> @e0b93139:java.lang.OutOfMemoryError: Java heap space
>>
>> Now when I reduce the size of caches (to a fraction of the default
>> settings) and number of warming Searchers (to 2), memory use is not
>> reduced and the problem stays. Only deactivating auto-warming will help.
>> When I set the heap size limit higher (and go into swap space), all the
>> extra memory seems to be used up right away, independently from
>> auto-warming.
>>
>> This all seems to be closely connected to sorting by a numerical field,
>> as switching this off does make memory use a lot more friendly.
>>
>> Is it normal to need that much Memory for such a small index?
>>
>> I suspect the problem is in Lucene, would it be better to post on their
>> list?
>>
>> Does anyone know a better way of getting the sorting done?
>>
>> Thanks in advance for your help,
>>
>> Chris
>>
>>
>> This is the field setup in schema.xml:
>>
>> <field name="id" type="long" stored="true" required="true"
>> multiValued="false" />
>> <field name="user-id" type="long" stored="true" required="true"
>> multiValued="false" />
>> <field name="text" type="text" indexed="true" multiValued="false" />
>> <field name="created" type="slong" indexed="true" multiValued="false" />
>>
>> And this is a sample query:
>>
>> select/?q=solr&start=0&rows=20&sort=created+desc
>>
>>
>

Re: Memory use with sorting problem

Posted by Chris Laux <ch...@terraminds.com>.

Hi again,

in the meantime I discovered the use of jmap (I'm not a Java programmer)
and found that all the memory was being used up by String and char[]
objects.

The Lucene docs have the following to say on sorting memory use:

> For String fields, the cache is larger: in addition to the above
array, the value of every term in the field is kept in memory. If there
are many unique terms in the field, this could be quite large.

(http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Sort.html)

I am sorting on the "slong" schema type, which is of course stored as a
string. The above quote seems to indicate that it is possible for a
field not to be a string for the purposes of the sort, while I took it
from LiA that everything is a string to Lucene.

What can I do to make sure the additional memory is not used by every
unique term? i.e. how to have the slong not be a "String field"?

Cheers,
Chris


Chris Laux wrote:
> Hi all,
> 
> I've been struggling with this problem for over a month now, and
> although memory issues have been discussed often, I don't seem to be
> able to find a fitting solution.
> 
> The index is merely 1.5 GB large, but memory use quickly fills out the
> heap max of 1 GB on a 2 GB machine. This then works fine until
> auto-warming starts. Switching the latter off altogether is unattractive
> as it leads to response times of up to 30 s. When auto-warming starts, I
> get this error:
> 
>> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.QueryResultKey
> @e0b93139:java.lang.OutOfMemoryError: Java heap space
> 
> Now when I reduce the size of caches (to a fraction of the default
> settings) and number of warming Searchers (to 2), memory use is not
> reduced and the problem stays. Only deactivating auto-warming will help.
> When I set the heap size limit higher (and go into swap space), all the
> extra memory seems to be used up right away, independently from
> auto-warming.
> 
> This all seems to be closely connected to sorting by a numerical field,
> as switching this off does make memory use a lot more friendly.
> 
> Is it normal to need that much Memory for such a small index?
> 
> I suspect the problem is in Lucene, would it be better to post on their
> list?
> 
> Does anyone know a better way of getting the sorting done?
> 
> Thanks in advance for your help,
> 
> Chris
> 
> 
> This is the field setup in schema.xml:
> 
> <field name="id" type="long" stored="true" required="true"
> multiValued="false" />
> <field name="user-id" type="long" stored="true" required="true"
> multiValued="false" />
> <field name="text" type="text" indexed="true" multiValued="false" />
> <field name="created" type="slong" indexed="true" multiValued="false" />
> 
> And this is a sample query:
> 
> select/?q=solr&start=0&rows=20&sort=created+desc
> 
>

Re: Memory use with sorting problem

Posted by Chris Laux <ch...@terraminds.com>.

Thanks for your reply. I made some memory saving changes, as per your
advice, but the problem remains.

> Set the max warming searchers to 1 to ensure that you never have more
> than one warming at the same time.

Done.

> How many documents are in your index?

Currently about 8 million.

> If you don't need range queries on these numeric fields, you might try
> switching from "sfloat" to "float" and from "sint" to "int".  The
> fieldCache representation will be smaller.

As far as I can see "slong" etc. is also needed for sorting queries
(which I do, as mentioned). Anyway, I got an error message when I tried
sorting on a "long" field.

>> Is it normal to need that much Memory for such a small index?
> 
> Some things are more related to the number of unique terms or the
> numer of documents more than the "size" of the index.

Is there a manageable way to find out / limit the number of unique terms
in Solr?

Cheers,

Chris

Re: Memory use with sorting problem

Posted by Yonik Seeley <yo...@apache.org>.

On Nov 21, 2007 11:06 AM, Chris Laux <ch...@terraminds.com> wrote:
> Now when I reduce the size of caches (to a fraction of the default
> settings) and number of warming Searchers (to 2),

Set the max warming searchers to 1 to ensure that you never have more
than one warming at the same time.

> memory use is not
> reduced and the problem stays. Only deactivating auto-warming will help.
> When I set the heap size limit higher (and go into swap space), all the
> extra memory seems to be used up right away, independently from
> auto-warming.
>
> This all seems to be closely connected to sorting by a numerical field,
> as switching this off does make memory use a lot more friendly.

How many documents are in your index?

If you don't need range queries on these numeric fields, you might try
switching from "sfloat" to "float" and from "sint" to "int".  The
fieldCache representation will be smaller.

> Is it normal to need that much Memory for such a small index?

Some things are more related to the number of unique terms or the
numer of documents more than the "size" of the index.

-Yonik