You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Benjamin Wiens <be...@gmail.com> on 2014/06/18 17:00:04 UTC

Calculating filterCache size

Hi,
I'm looking for a formula to calculate filterCache size in the RAM.

The best estimation I can find is here
http://stackoverflow.com/questions/20999904/solr-filter-cache-fastlrucache-takes-too-much-memory-and-results-in-out-of-mem

An index of 1.000.000 would thus take 12,5 GB in the RAM with this formula:

100.000.000.000 bit / 8 (to byte) / 1000 (to kb) / 1000 (to mb) / 1000 (to
gb) = 12,5 GB

Can anyone confirm this formula? I am aware that if the result of the
filter query is low, it can just create something else which take up less
memory.

I know I can just start with a low filterCache size and kick it up in my
environment, but I'd like to come up with a scientific formula.

Thanks,
Ben

Re: Calculating filterCache size

Posted by Benjamin Wiens <be...@gmail.com>.
Thank you for your help!

I wrote an article on Performance Testing Solr filterCache "Shedding Light
on Apache Solr filterCache for VuFind" that I am hoping to get published.

https://docs.google.com/document/d/1vl-nmlprSULvNZKQNrqp65eLnLhG9s_ydXQtg9iML10

Anyone can comment and I would highly appreciate this! My biggest fear is
to have something inaccurate about filterCache or Solr in general in there.
Any and all suggestions welcome!

Thanks again,
Ben


On Thu, Jun 19, 2014 at 3:42 PM, Erick Erickson <er...@gmail.com>
wrote:

> That's specific to using the facet.method=enum, but do admit it's easy
> to miss that.
>
> I added a note about that though...
>
> Thanks for pointing that out!
>
>
> On Thu, Jun 19, 2014 at 9:38 AM, Benjamin Wiens
> <be...@gmail.com> wrote:
> > Thanks to both of you. Yes the mentioned config is illustrative, we
> decided
> > for 512 after thorough testing. However, when you google "Solr
> filterCache"
> > the first link is the community wiki which has a config even higher than
> > the illustration which is quite different from the official reference
> > guide. It might be a good idea to change this unless there's a very small
> > index.
> >
> > http://wiki.apache.org/solr/SolrCaching#filterCache
> >
> >     <filterCache      class="solr.LRUCache"      size="16384"
> > initialSize="4096"      autowarmCount="4096"/>
> >
> >
> >
> >
> >
> >
> > On Thu, Jun 19, 2014 at 9:48 AM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> >> Ben:
> >>
> >> As Shawn says, you're on the right track...
> >>
> >> Do note, though, that a 10K size here is probably excessive, YMMV of
> >> course.
> >>
> >> And an autowarm count of 5,000 is almost _certainly_ far more than you
> >> want. All these fq clauses get re-executed whenever a new searcher is
> >> opened (soft commit or hard commit with openSearcher=true). I realize
> >> this may just be illustrative. Is this your actual setup? And if so,
> >> what is your motivation for 5,000 autowarm count?
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Jun 18, 2014 at 11:42 AM, Shawn Heisey <so...@elyograg.org>
> wrote:
> >> > On 6/18/2014 10:57 AM, Benjamin Wiens wrote:
> >> >> Thanks Erick!
> >> >> So let's say I have a config of
> >> >>
> >> >> <filterCache
> >> >> class="solr.FastLRUCache"
> >> >> size="10000"
> >> >> initialSize="10000"
> >> >> autowarmCount="5000"/>
> >> >>
> >> >> MaxDocuments = 1,000,000
> >> >>
> >> >> So according to your formula, filterCache should roughly have the
> >> potential
> >> >> to consume this much RAM:
> >> >> ((1,000,000 / 8) + 128) * (10,000) = 1,251,280,000 byte / 1,000 =
> >> >> 1,251,280 kb / 1,000 = 1,251.28 mb / 1000 = 1.25 gb
> >> >
> >> > Yes, this is essentially correct.  If you want to arrive at a number
> >> > that's more accurate for the way that OS tools will report memory,
> >> > you'll divide by 1024 instead of 1000 for each of the larger units.
> >> > That results in a size of 1.16GB instead of 1.25.  Computers think in
> >> > powers of 2, dividing by 1000 assumes a bias to how people think, in
> >> > powers of 10.  It's the same thing that causes your computer to report
> >> > 931GB for a 1TB hard drive.
> >> >
> >> > Thanks,
> >> > Shawn
> >> >
> >>
>

Re: Calculating filterCache size

Posted by Erick Erickson <er...@gmail.com>.
That's specific to using the facet.method=enum, but do admit it's easy
to miss that.

I added a note about that though...

Thanks for pointing that out!


On Thu, Jun 19, 2014 at 9:38 AM, Benjamin Wiens
<be...@gmail.com> wrote:
> Thanks to both of you. Yes the mentioned config is illustrative, we decided
> for 512 after thorough testing. However, when you google "Solr filterCache"
> the first link is the community wiki which has a config even higher than
> the illustration which is quite different from the official reference
> guide. It might be a good idea to change this unless there's a very small
> index.
>
> http://wiki.apache.org/solr/SolrCaching#filterCache
>
>     <filterCache      class="solr.LRUCache"      size="16384"
> initialSize="4096"      autowarmCount="4096"/>
>
>
>
>
>
>
> On Thu, Jun 19, 2014 at 9:48 AM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Ben:
>>
>> As Shawn says, you're on the right track...
>>
>> Do note, though, that a 10K size here is probably excessive, YMMV of
>> course.
>>
>> And an autowarm count of 5,000 is almost _certainly_ far more than you
>> want. All these fq clauses get re-executed whenever a new searcher is
>> opened (soft commit or hard commit with openSearcher=true). I realize
>> this may just be illustrative. Is this your actual setup? And if so,
>> what is your motivation for 5,000 autowarm count?
>>
>> Best,
>> Erick
>>
>> On Wed, Jun 18, 2014 at 11:42 AM, Shawn Heisey <so...@elyograg.org> wrote:
>> > On 6/18/2014 10:57 AM, Benjamin Wiens wrote:
>> >> Thanks Erick!
>> >> So let's say I have a config of
>> >>
>> >> <filterCache
>> >> class="solr.FastLRUCache"
>> >> size="10000"
>> >> initialSize="10000"
>> >> autowarmCount="5000"/>
>> >>
>> >> MaxDocuments = 1,000,000
>> >>
>> >> So according to your formula, filterCache should roughly have the
>> potential
>> >> to consume this much RAM:
>> >> ((1,000,000 / 8) + 128) * (10,000) = 1,251,280,000 byte / 1,000 =
>> >> 1,251,280 kb / 1,000 = 1,251.28 mb / 1000 = 1.25 gb
>> >
>> > Yes, this is essentially correct.  If you want to arrive at a number
>> > that's more accurate for the way that OS tools will report memory,
>> > you'll divide by 1024 instead of 1000 for each of the larger units.
>> > That results in a size of 1.16GB instead of 1.25.  Computers think in
>> > powers of 2, dividing by 1000 assumes a bias to how people think, in
>> > powers of 10.  It's the same thing that causes your computer to report
>> > 931GB for a 1TB hard drive.
>> >
>> > Thanks,
>> > Shawn
>> >
>>

Re: Calculating filterCache size

Posted by Benjamin Wiens <be...@gmail.com>.
Thanks to both of you. Yes the mentioned config is illustrative, we decided
for 512 after thorough testing. However, when you google "Solr filterCache"
the first link is the community wiki which has a config even higher than
the illustration which is quite different from the official reference
guide. It might be a good idea to change this unless there's a very small
index.

http://wiki.apache.org/solr/SolrCaching#filterCache

    <filterCache      class="solr.LRUCache"      size="16384"
initialSize="4096"      autowarmCount="4096"/>






On Thu, Jun 19, 2014 at 9:48 AM, Erick Erickson <er...@gmail.com>
wrote:

> Ben:
>
> As Shawn says, you're on the right track...
>
> Do note, though, that a 10K size here is probably excessive, YMMV of
> course.
>
> And an autowarm count of 5,000 is almost _certainly_ far more than you
> want. All these fq clauses get re-executed whenever a new searcher is
> opened (soft commit or hard commit with openSearcher=true). I realize
> this may just be illustrative. Is this your actual setup? And if so,
> what is your motivation for 5,000 autowarm count?
>
> Best,
> Erick
>
> On Wed, Jun 18, 2014 at 11:42 AM, Shawn Heisey <so...@elyograg.org> wrote:
> > On 6/18/2014 10:57 AM, Benjamin Wiens wrote:
> >> Thanks Erick!
> >> So let's say I have a config of
> >>
> >> <filterCache
> >> class="solr.FastLRUCache"
> >> size="10000"
> >> initialSize="10000"
> >> autowarmCount="5000"/>
> >>
> >> MaxDocuments = 1,000,000
> >>
> >> So according to your formula, filterCache should roughly have the
> potential
> >> to consume this much RAM:
> >> ((1,000,000 / 8) + 128) * (10,000) = 1,251,280,000 byte / 1,000 =
> >> 1,251,280 kb / 1,000 = 1,251.28 mb / 1000 = 1.25 gb
> >
> > Yes, this is essentially correct.  If you want to arrive at a number
> > that's more accurate for the way that OS tools will report memory,
> > you'll divide by 1024 instead of 1000 for each of the larger units.
> > That results in a size of 1.16GB instead of 1.25.  Computers think in
> > powers of 2, dividing by 1000 assumes a bias to how people think, in
> > powers of 10.  It's the same thing that causes your computer to report
> > 931GB for a 1TB hard drive.
> >
> > Thanks,
> > Shawn
> >
>

Re: Calculating filterCache size

Posted by Erick Erickson <er...@gmail.com>.
Ben:

As Shawn says, you're on the right track...

Do note, though, that a 10K size here is probably excessive, YMMV of course.

And an autowarm count of 5,000 is almost _certainly_ far more than you
want. All these fq clauses get re-executed whenever a new searcher is
opened (soft commit or hard commit with openSearcher=true). I realize
this may just be illustrative. Is this your actual setup? And if so,
what is your motivation for 5,000 autowarm count?

Best,
Erick

On Wed, Jun 18, 2014 at 11:42 AM, Shawn Heisey <so...@elyograg.org> wrote:
> On 6/18/2014 10:57 AM, Benjamin Wiens wrote:
>> Thanks Erick!
>> So let's say I have a config of
>>
>> <filterCache
>> class="solr.FastLRUCache"
>> size="10000"
>> initialSize="10000"
>> autowarmCount="5000"/>
>>
>> MaxDocuments = 1,000,000
>>
>> So according to your formula, filterCache should roughly have the potential
>> to consume this much RAM:
>> ((1,000,000 / 8) + 128) * (10,000) = 1,251,280,000 byte / 1,000 =
>> 1,251,280 kb / 1,000 = 1,251.28 mb / 1000 = 1.25 gb
>
> Yes, this is essentially correct.  If you want to arrive at a number
> that's more accurate for the way that OS tools will report memory,
> you'll divide by 1024 instead of 1000 for each of the larger units.
> That results in a size of 1.16GB instead of 1.25.  Computers think in
> powers of 2, dividing by 1000 assumes a bias to how people think, in
> powers of 10.  It's the same thing that causes your computer to report
> 931GB for a 1TB hard drive.
>
> Thanks,
> Shawn
>

Re: Calculating filterCache size

Posted by Shawn Heisey <so...@elyograg.org>.
On 6/18/2014 10:57 AM, Benjamin Wiens wrote:
> Thanks Erick!
> So let's say I have a config of
>
> <filterCache
> class="solr.FastLRUCache"
> size="10000"
> initialSize="10000"
> autowarmCount="5000"/>
>
> MaxDocuments = 1,000,000
>
> So according to your formula, filterCache should roughly have the potential
> to consume this much RAM:
> ((1,000,000 / 8) + 128) * (10,000) = 1,251,280,000 byte / 1,000 =
> 1,251,280 kb / 1,000 = 1,251.28 mb / 1000 = 1.25 gb

Yes, this is essentially correct.  If you want to arrive at a number
that's more accurate for the way that OS tools will report memory,
you'll divide by 1024 instead of 1000 for each of the larger units. 
That results in a size of 1.16GB instead of 1.25.  Computers think in
powers of 2, dividing by 1000 assumes a bias to how people think, in
powers of 10.  It's the same thing that causes your computer to report
931GB for a 1TB hard drive.

Thanks,
Shawn


Re: Calculating filterCache size

Posted by Benjamin Wiens <be...@gmail.com>.
Thanks Erick!
So let's say I have a config of

<filterCache
class="solr.FastLRUCache"
size="10000"
initialSize="10000"
autowarmCount="5000"/>

MaxDocuments = 1,000,000

So according to your formula, filterCache should roughly have the potential
to consume this much RAM:
((1,000,000 / 8) + 128) * (10,000) = 1,251,280,000 byte / 1,000 =
1,251,280 kb / 1,000 = 1,251.28 mb / 1000 = 1.25 gb

Thanks,
Ben





On Wed, Jun 18, 2014 at 11:13 AM, Erick Erickson <er...@gmail.com>
wrote:

> You pretty much have it. Actually, the number you want is the "maxDoc"
> figure from the admin UI screen. The formula will be maxDoc/8 bytes +
> (some overhead but not enough to matter), for EVERY entry.
>
> You'll never fit 100B docs on a single machine anyway. Lucene has a
> hard limit of 2B docs, and I've never heard of anyone fitting even 2B
> docs on a single machine in a performant manner. So under any
> circumstance this won't all be on one machine. You have to figure it
> locally for each shard. And at this size there's no doubt you'll be
> sharding!
>
> Also be very careful here. the "size" parameter in the cache
> definition is the number of _entries_, NOT the number of _bytes_.
>
> _Each_ entry is that size! So the cache requirements will be close to
> ((maxDoc/8) + 128) * (size_defined_in_the_config_file), where 128 is
> an approximation of the storage necessary for the text of the fq
> clause.
>
> Best,
> Erick
>
> On Wed, Jun 18, 2014 at 8:00 AM, Benjamin Wiens
> <be...@gmail.com> wrote:
> > Hi,
> > I'm looking for a formula to calculate filterCache size in the RAM.
> >
> > The best estimation I can find is here
> >
> http://stackoverflow.com/questions/20999904/solr-filter-cache-fastlrucache-takes-too-much-memory-and-results-in-out-of-mem
> >
> > An index of 1.000.000 would thus take 12,5 GB in the RAM with this
> formula:
> >
> > 100.000.000.000 bit / 8 (to byte) / 1000 (to kb) / 1000 (to mb) / 1000
> (to
> > gb) = 12,5 GB
> >
> > Can anyone confirm this formula? I am aware that if the result of the
> > filter query is low, it can just create something else which take up less
> > memory.
> >
> > I know I can just start with a low filterCache size and kick it up in my
> > environment, but I'd like to come up with a scientific formula.
> >
> > Thanks,
> > Ben
>

Re: Calculating filterCache size

Posted by Erick Erickson <er...@gmail.com>.
You pretty much have it. Actually, the number you want is the "maxDoc"
figure from the admin UI screen. The formula will be maxDoc/8 bytes +
(some overhead but not enough to matter), for EVERY entry.

You'll never fit 100B docs on a single machine anyway. Lucene has a
hard limit of 2B docs, and I've never heard of anyone fitting even 2B
docs on a single machine in a performant manner. So under any
circumstance this won't all be on one machine. You have to figure it
locally for each shard. And at this size there's no doubt you'll be
sharding!

Also be very careful here. the "size" parameter in the cache
definition is the number of _entries_, NOT the number of _bytes_.

_Each_ entry is that size! So the cache requirements will be close to
((maxDoc/8) + 128) * (size_defined_in_the_config_file), where 128 is
an approximation of the storage necessary for the text of the fq
clause.

Best,
Erick

On Wed, Jun 18, 2014 at 8:00 AM, Benjamin Wiens
<be...@gmail.com> wrote:
> Hi,
> I'm looking for a formula to calculate filterCache size in the RAM.
>
> The best estimation I can find is here
> http://stackoverflow.com/questions/20999904/solr-filter-cache-fastlrucache-takes-too-much-memory-and-results-in-out-of-mem
>
> An index of 1.000.000 would thus take 12,5 GB in the RAM with this formula:
>
> 100.000.000.000 bit / 8 (to byte) / 1000 (to kb) / 1000 (to mb) / 1000 (to
> gb) = 12,5 GB
>
> Can anyone confirm this formula? I am aware that if the result of the
> filter query is low, it can just create something else which take up less
> memory.
>
> I know I can just start with a low filterCache size and kick it up in my
> environment, but I'd like to come up with a scientific formula.
>
> Thanks,
> Ben