You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Joe Pollard <jo...@bazaarvoice.com> on 2009/04/06 22:43:20 UTC

Coming up with a model of memory usage

To combat our frequent OutOfMemory Exceptions, I'm attempting to come up
with a model so that we can determine how much memory to give Solr based
on how much data we have (as we expand to more data types eligible to be
supported this becomes more important).

Are there any published guidelines on how much memory a particular
document takes up in memory, based on the data types, etc?

I have several stored fields, numerous other non-stored fields, a
largish copyTo field, and I am doing some sorting on indexed, non-stored
fields.

Any pointers would be appreciated!

Thanks,
-Joe


RE: Coming up with a model of memory usage

Posted by Joe Pollard <Jo...@bazaarvoice.com>.
It does end up in the right order (sorted), but it's very expensive.  Sorting by a couple fields that each have fewer unique index values seems to limit the memory consumption greatly.

-----Original Message-----
From: Walter Underwood [mailto:wunderwood@netflix.com]
Sent: Tuesday, April 07, 2009 11:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Coming up with a model of memory usage

Why tokenize the date? It sorts just fine as a string. --wunder

On 4/7/09 8:50 AM, "Erick Erickson" <er...@gmail.com> wrote:

> Your observations about date sorting are probably correct. The
> issue is that the sort caches in Lucene look at the unique terms.
> There are many more unique terms (nearly every one) in
> 2008-08-12T12:18:26.510
>
> then when the field is split. You can reduce memory consumption
> when sorting even more by splitting into more fields, but that's up
> to you to decide whether or not it's worth the effort....
>
> Best
> Erick
>
> On Tue, Apr 7, 2009 at 10:55 AM, Joe Pollard
> <jo...@bazaarvoice.com>wrote:
>
>> It doesn't seem to matter whether fields are stored or not, but I've
>> found a rather striking difference in the memory requirements during
>> sorting.  Sorting on a string field representing datetime like
>> '2008-08-12T12:18:26.510' is about twice as memory intense as sorting
>> first by '2008-08-12' and then by '121826'.
>>
>> Any other tips/guidance like this would be great!
>>
>> Thanks,
>> -Joe
>>
>> On Mon, 2009-04-06 at 15:43 -0500, Joe Pollard wrote:
>>> To combat our frequent OutOfMemory Exceptions, I'm attempting to come up
>>> with a model so that we can determine how much memory to give Solr based
>>> on how much data we have (as we expand to more data types eligible to be
>>> supported this becomes more important).
>>>
>>> Are there any published guidelines on how much memory a particular
>>> document takes up in memory, based on the data types, etc?
>>>
>>> I have several stored fields, numerous other non-stored fields, a
>>> largish copyTo field, and I am doing some sorting on indexed, non-stored
>>> fields.
>>>
>>> Any pointers would be appreciated!
>>>
>>> Thanks,
>>> -Joe
>>>
>>
>>


RE: Coming up with a model of memory usage

Posted by Joe Pollard <Jo...@bazaarvoice.com>.
Good info to have.  Thanks Erick.

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Tuesday, April 07, 2009 10:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Coming up with a model of memory usage

Your observations about date sorting are probably correct. The
issue is that the sort caches in Lucene look at the unique terms.
There are many more unique terms (nearly every one) in
2008-08-12T12:18:26.510

then when the field is split. You can reduce memory consumption
when sorting even more by splitting into more fields, but that's up
to you to decide whether or not it's worth the effort....

Best
Erick

On Tue, Apr 7, 2009 at 10:55 AM, Joe Pollard <jo...@bazaarvoice.com>wrote:

> It doesn't seem to matter whether fields are stored or not, but I've
> found a rather striking difference in the memory requirements during
> sorting.  Sorting on a string field representing datetime like
> '2008-08-12T12:18:26.510' is about twice as memory intense as sorting
> first by '2008-08-12' and then by '121826'.
>
> Any other tips/guidance like this would be great!
>
> Thanks,
> -Joe
>
> On Mon, 2009-04-06 at 15:43 -0500, Joe Pollard wrote:
> > To combat our frequent OutOfMemory Exceptions, I'm attempting to come up
> > with a model so that we can determine how much memory to give Solr based
> > on how much data we have (as we expand to more data types eligible to be
> > supported this becomes more important).
> >
> > Are there any published guidelines on how much memory a particular
> > document takes up in memory, based on the data types, etc?
> >
> > I have several stored fields, numerous other non-stored fields, a
> > largish copyTo field, and I am doing some sorting on indexed, non-stored
> > fields.
> >
> > Any pointers would be appreciated!
> >
> > Thanks,
> > -Joe
> >
>
>

Re: Coming up with a model of memory usage

Posted by Walter Underwood <wu...@netflix.com>.
Why tokenize the date? It sorts just fine as a string. --wunder

On 4/7/09 8:50 AM, "Erick Erickson" <er...@gmail.com> wrote:

> Your observations about date sorting are probably correct. The
> issue is that the sort caches in Lucene look at the unique terms.
> There are many more unique terms (nearly every one) in
> 2008-08-12T12:18:26.510
> 
> then when the field is split. You can reduce memory consumption
> when sorting even more by splitting into more fields, but that's up
> to you to decide whether or not it's worth the effort....
> 
> Best
> Erick
> 
> On Tue, Apr 7, 2009 at 10:55 AM, Joe Pollard
> <jo...@bazaarvoice.com>wrote:
> 
>> It doesn't seem to matter whether fields are stored or not, but I've
>> found a rather striking difference in the memory requirements during
>> sorting.  Sorting on a string field representing datetime like
>> '2008-08-12T12:18:26.510' is about twice as memory intense as sorting
>> first by '2008-08-12' and then by '121826'.
>> 
>> Any other tips/guidance like this would be great!
>> 
>> Thanks,
>> -Joe
>> 
>> On Mon, 2009-04-06 at 15:43 -0500, Joe Pollard wrote:
>>> To combat our frequent OutOfMemory Exceptions, I'm attempting to come up
>>> with a model so that we can determine how much memory to give Solr based
>>> on how much data we have (as we expand to more data types eligible to be
>>> supported this becomes more important).
>>> 
>>> Are there any published guidelines on how much memory a particular
>>> document takes up in memory, based on the data types, etc?
>>> 
>>> I have several stored fields, numerous other non-stored fields, a
>>> largish copyTo field, and I am doing some sorting on indexed, non-stored
>>> fields.
>>> 
>>> Any pointers would be appreciated!
>>> 
>>> Thanks,
>>> -Joe
>>> 
>> 
>> 


Re: Coming up with a model of memory usage

Posted by Erick Erickson <er...@gmail.com>.
Your observations about date sorting are probably correct. The
issue is that the sort caches in Lucene look at the unique terms.
There are many more unique terms (nearly every one) in
2008-08-12T12:18:26.510

then when the field is split. You can reduce memory consumption
when sorting even more by splitting into more fields, but that's up
to you to decide whether or not it's worth the effort....

Best
Erick

On Tue, Apr 7, 2009 at 10:55 AM, Joe Pollard <jo...@bazaarvoice.com>wrote:

> It doesn't seem to matter whether fields are stored or not, but I've
> found a rather striking difference in the memory requirements during
> sorting.  Sorting on a string field representing datetime like
> '2008-08-12T12:18:26.510' is about twice as memory intense as sorting
> first by '2008-08-12' and then by '121826'.
>
> Any other tips/guidance like this would be great!
>
> Thanks,
> -Joe
>
> On Mon, 2009-04-06 at 15:43 -0500, Joe Pollard wrote:
> > To combat our frequent OutOfMemory Exceptions, I'm attempting to come up
> > with a model so that we can determine how much memory to give Solr based
> > on how much data we have (as we expand to more data types eligible to be
> > supported this becomes more important).
> >
> > Are there any published guidelines on how much memory a particular
> > document takes up in memory, based on the data types, etc?
> >
> > I have several stored fields, numerous other non-stored fields, a
> > largish copyTo field, and I am doing some sorting on indexed, non-stored
> > fields.
> >
> > Any pointers would be appreciated!
> >
> > Thanks,
> > -Joe
> >
>
>

RE: Coming up with a model of memory usage

Posted by Joe Pollard <Jo...@bazaarvoice.com>.
Cool, great resource, thanks.

-----Original Message-----
From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com]
Sent: Tuesday, April 07, 2009 10:13 AM
To: solr-user@lucene.apache.org
Subject: Re: Coming up with a model of memory usage

On Tue, Apr 7, 2009 at 8:25 PM, Joe Pollard <jo...@bazaarvoice.com>wrote:

> It doesn't seem to matter whether fields are stored or not, but I've
> found a rather striking difference in the memory requirements during
> sorting.  Sorting on a string field representing datetime like
> '2008-08-12T12:18:26.510' is about twice as memory intense as sorting
> first by '2008-08-12' and then by '121826'.
>

> Any other tips/guidance like this would be great!
>

There are a lot of threads on memory usage on the mailing list. Searching on
the mailing list will give you a lot of information.

http://lucene.markmail.org/search/solr+sorting+memory

--
Regards,
Shalin Shekhar Mangar.

Re: Coming up with a model of memory usage

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, Apr 7, 2009 at 8:25 PM, Joe Pollard <jo...@bazaarvoice.com>wrote:

> It doesn't seem to matter whether fields are stored or not, but I've
> found a rather striking difference in the memory requirements during
> sorting.  Sorting on a string field representing datetime like
> '2008-08-12T12:18:26.510' is about twice as memory intense as sorting
> first by '2008-08-12' and then by '121826'.
>

> Any other tips/guidance like this would be great!
>

There are a lot of threads on memory usage on the mailing list. Searching on
the mailing list will give you a lot of information.

http://lucene.markmail.org/search/solr+sorting+memory

-- 
Regards,
Shalin Shekhar Mangar.

Re: Coming up with a model of memory usage

Posted by Joe Pollard <jo...@bazaarvoice.com>.
It doesn't seem to matter whether fields are stored or not, but I've
found a rather striking difference in the memory requirements during
sorting.  Sorting on a string field representing datetime like
'2008-08-12T12:18:26.510' is about twice as memory intense as sorting
first by '2008-08-12' and then by '121826'.

Any other tips/guidance like this would be great!

Thanks,
-Joe

On Mon, 2009-04-06 at 15:43 -0500, Joe Pollard wrote:
> To combat our frequent OutOfMemory Exceptions, I'm attempting to come up
> with a model so that we can determine how much memory to give Solr based
> on how much data we have (as we expand to more data types eligible to be
> supported this becomes more important).
> 
> Are there any published guidelines on how much memory a particular
> document takes up in memory, based on the data types, etc?
> 
> I have several stored fields, numerous other non-stored fields, a
> largish copyTo field, and I am doing some sorting on indexed, non-stored
> fields.
> 
> Any pointers would be appreciated!
> 
> Thanks,
> -Joe
>