You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Manohar Sripada <ma...@gmail.com> on 2014/12/23 09:48:33 UTC

Loading data to FieldValueCache

Hello,

>From the wiki, it states that
http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used for
faceting.

Can someone please throw some light on how to load data to this cache. Like
on what solrquery option does this consider the data to be loaded to this
cache.

My requirement is I have 10 facet fields (with facetlimit - 5) to be shown
in my UI. I want to speed up this by using this cache. Is there a way where
I can specify only the list of fields to be loaded to FieldValue Cache?

Thanks,
Manohar

Re: Loading data to FieldValueCache

Posted by Manohar Sripada <ma...@gmail.com>.
Erick,

I am trying to do a premature optimization. *There will be no updates to my
index. So, no worries about ageing out or garbage collection.*
Let me get my understanding correctly; when we talk about filterCache, it
just stores the document IDs in the cache right?

And my setup is as follows. There are 16 nodes in my SolrCloud. Each having
64 GB of RAM, out of which I am allocating 45 GB to Solr. I have a
collection (say Products, which contains around 100 million Docs), which I
created with 64 shards, replication factor 2, and 8 shards per node. Each
shard is getting around 1.6 Million Documents. So my math here for
filterCache for a specific filter will be -


   - an average filter query will be 20 bytes, so 1000 (distinct number of
   states) x 20 = 2 MB
   - and considering union of DocIds for all the values of a given filter
   equals to total number of DocId's present in the index. There are 1.6
   Million Documents in a  solr core. So, 1,600,000 x 8 Bytes (for each Doc
   Id) equals to 12.8 MB
   - There will be 8 solrcores per node - 8 x 12.8 MB = *102 MB. *

This is the size of cache for a single filter in a single node. Considering
the heapsize I have given, I think this shouldn't be an issue..

Thanks,
Manohar

On Fri, Dec 26, 2014 at 10:56 PM, Erick Erickson <er...@gmail.com>
wrote:

> Manohar:
>
> Please approach this cautiously. You state that you have "hundreds of
> states".
> Every 100 states will use roughly 1.2G of your filter cache. Just for this
> field. Plus it'll fill up the cache and they may soon be aged out anyway.
> Can you really afford the space? Is it really a problem that needs to be
> solved at this point? This _really_ sounds like premature optimization
> to me as you haven't
> demonstrated that there's an actual problem you're solving.
>
> OTOH, of course, if you're experimenting to better understand all the
> ins and outs
> of the process that's another thing entirely ;)....
>
> Toke:
>
> I don't know the complete algorithm, but if the number of docs that
> satisfy the fq is "small enough",
> then just the internal Lucene doc IDs are stored rather than a bitset.
> What exactly "small enough" is
> I don't know off the top of my head. And I've got to assume looking
> stuff up in a list is slower than
> indexing into a bitset so I suspect "small enough" is very small....
>
> On Fri, Dec 26, 2014 at 3:00 AM, Manohar Sripada <ma...@gmail.com>
> wrote:
> > Thanks Toke for the explanation, I will experiment with
> > f.state.facet.method=enum
> >
> > Thanks,
> > Manohar
> >
> > On Fri, Dec 26, 2014 at 4:09 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
> > wrote:
> >
> >> Manohar Sripada [manohar211@gmail.com] wrote:
> >> > I have 100 million documents in my index. The maxDoc here is the
> maximum
> >> > Documents in each shard, right? How is it determined that each entry
> will
> >> > occupy maxDoc/8 approximately.
> >>
> >> Assuming that it is random whether a document is part of the result set
> or
> >> not, the most efficient representation is 1 bit/doc (this is often
> called a
> >> bitmap or bitset). So the total number of bits will be maxDoc, which is
> the
> >> same as maxDoc/8 bytes.
> >>
> >> Of course, result sets are rarely random, so it is possible to have
> other
> >> and more compact representations. I do not know how that plays out in
> >> Lucene. Hopefully somebody else can help here.
> >>
> >> > If I have to add facet.method=enum every time in the query, how
> should I
> >> > specify for each field separately?
> >>
> >> f.state.facet.method=enum
> >>
> >> See https://wiki.apache.org/solr/SimpleFacetParameters#Parameters
> >>
> >> - Toke Eskildsen
> >>
>

Re: Loading data to FieldValueCache

Posted by Erick Erickson <er...@gmail.com>.
bq: There will be no updates to my index. So, no worries about ageing
out or garbage collection

This is irrelevant to aging out filterCache entries, this is purely query time.

bq: Each having 64 GB of RAM, out of which I am allocating 45 GB to Solr.

It's usually a mistake to give Solr so much ram relative to the OS, see Uwe's
excellent blog here:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

That said, you know your system best. And the fact that you have so many
shards may well mean that memory considerations aren't relevant.

Personally, though, I think you've massively over-sharded your
collection and are
incurring significant overhead, but again you know your requirements much better
than I do.

Best,
Erick

On Mon, Dec 29, 2014 at 7:43 AM, Yonik Seeley <yo...@heliosearch.com> wrote:
> On Fri, Dec 26, 2014 at 12:26 PM, Erick Erickson
> <er...@gmail.com> wrote:
>> I don't know the complete algorithm, but if the number of docs that
>> satisfy the fq is "small enough",
>> then just the internal Lucene doc IDs are stored rather than a bitset.
>
> If smaller than maxDoc/64 ids are collected, a sorted int set is used
> instead of a bitset.
> Also, the enum method can skip caching for the "smaller" terms:
>
> facet.enum.cache.minDf=100
> might be good for general purpose.
> Or set the value really high to not use the filter cache at all.
>
> -Yonik

Re: Loading data to FieldValueCache

Posted by Yonik Seeley <yo...@heliosearch.com>.
On Fri, Dec 26, 2014 at 12:26 PM, Erick Erickson
<er...@gmail.com> wrote:
> I don't know the complete algorithm, but if the number of docs that
> satisfy the fq is "small enough",
> then just the internal Lucene doc IDs are stored rather than a bitset.

If smaller than maxDoc/64 ids are collected, a sorted int set is used
instead of a bitset.
Also, the enum method can skip caching for the "smaller" terms:

facet.enum.cache.minDf=100
might be good for general purpose.
Or set the value really high to not use the filter cache at all.

-Yonik

Re: Loading data to FieldValueCache

Posted by Erick Erickson <er...@gmail.com>.
Manohar:

Please approach this cautiously. You state that you have "hundreds of states".
Every 100 states will use roughly 1.2G of your filter cache. Just for this
field. Plus it'll fill up the cache and they may soon be aged out anyway.
Can you really afford the space? Is it really a problem that needs to be
solved at this point? This _really_ sounds like premature optimization
to me as you haven't
demonstrated that there's an actual problem you're solving.

OTOH, of course, if you're experimenting to better understand all the
ins and outs
of the process that's another thing entirely ;)....

Toke:

I don't know the complete algorithm, but if the number of docs that
satisfy the fq is "small enough",
then just the internal Lucene doc IDs are stored rather than a bitset.
What exactly "small enough" is
I don't know off the top of my head. And I've got to assume looking
stuff up in a list is slower than
indexing into a bitset so I suspect "small enough" is very small....

On Fri, Dec 26, 2014 at 3:00 AM, Manohar Sripada <ma...@gmail.com> wrote:
> Thanks Toke for the explanation, I will experiment with
> f.state.facet.method=enum
>
> Thanks,
> Manohar
>
> On Fri, Dec 26, 2014 at 4:09 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
> wrote:
>
>> Manohar Sripada [manohar211@gmail.com] wrote:
>> > I have 100 million documents in my index. The maxDoc here is the maximum
>> > Documents in each shard, right? How is it determined that each entry will
>> > occupy maxDoc/8 approximately.
>>
>> Assuming that it is random whether a document is part of the result set or
>> not, the most efficient representation is 1 bit/doc (this is often called a
>> bitmap or bitset). So the total number of bits will be maxDoc, which is the
>> same as maxDoc/8 bytes.
>>
>> Of course, result sets are rarely random, so it is possible to have other
>> and more compact representations. I do not know how that plays out in
>> Lucene. Hopefully somebody else can help here.
>>
>> > If I have to add facet.method=enum every time in the query, how should I
>> > specify for each field separately?
>>
>> f.state.facet.method=enum
>>
>> See https://wiki.apache.org/solr/SimpleFacetParameters#Parameters
>>
>> - Toke Eskildsen
>>

Re: Loading data to FieldValueCache

Posted by Manohar Sripada <ma...@gmail.com>.
Thanks Toke for the explanation, I will experiment with
f.state.facet.method=enum

Thanks,
Manohar

On Fri, Dec 26, 2014 at 4:09 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
wrote:

> Manohar Sripada [manohar211@gmail.com] wrote:
> > I have 100 million documents in my index. The maxDoc here is the maximum
> > Documents in each shard, right? How is it determined that each entry will
> > occupy maxDoc/8 approximately.
>
> Assuming that it is random whether a document is part of the result set or
> not, the most efficient representation is 1 bit/doc (this is often called a
> bitmap or bitset). So the total number of bits will be maxDoc, which is the
> same as maxDoc/8 bytes.
>
> Of course, result sets are rarely random, so it is possible to have other
> and more compact representations. I do not know how that plays out in
> Lucene. Hopefully somebody else can help here.
>
> > If I have to add facet.method=enum every time in the query, how should I
> > specify for each field separately?
>
> f.state.facet.method=enum
>
> See https://wiki.apache.org/solr/SimpleFacetParameters#Parameters
>
> - Toke Eskildsen
>

RE: Loading data to FieldValueCache

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
Manohar Sripada [manohar211@gmail.com] wrote:
> I have 100 million documents in my index. The maxDoc here is the maximum
> Documents in each shard, right? How is it determined that each entry will
> occupy maxDoc/8 approximately.

Assuming that it is random whether a document is part of the result set or not, the most efficient representation is 1 bit/doc (this is often called a bitmap or bitset). So the total number of bits will be maxDoc, which is the same as maxDoc/8 bytes.

Of course, result sets are rarely random, so it is possible to have other and more compact representations. I do not know how that plays out in Lucene. Hopefully somebody else can help here.

> If I have to add facet.method=enum every time in the query, how should I
> specify for each field separately?

f.state.facet.method=enum

See https://wiki.apache.org/solr/SimpleFacetParameters#Parameters

- Toke Eskildsen

Re: Loading data to FieldValueCache

Posted by Manohar Sripada <ma...@gmail.com>.
I have 100 million documents in my index. The maxDoc here is the maximum
Documents in each shard, right? How is it determined that each entry will
occupy maxDoc/8 approximately.

If I have to add facet.method=enum every time in the query, how should I
specify for each field separately? Like in the above example, I am planning
to use products facet with facet.methed=fc and state facet with
facet.method=enum. How do I specify different facet methods for different
fields while trying to get both of these facets.

Thanks,
Manohar

On Thu, Dec 25, 2014 at 2:52 AM, Erick Erickson <er...@gmail.com>
wrote:

> Inline.
>
> On Tue, Dec 23, 2014 at 11:12 PM, Manohar Sripada <ma...@gmail.com>
> wrote:
> > Okay. Let me try like this, as mine is a read-only index. I will have
> some
> > queries in firstSearcher event listener
> > 1) q=*:*&facet=true&facet.method=enum&facet.field=state   --> To load all
> > the state related unique values to filterCache.
>
> It's not necessary to use facet.method=enum here at all, just facet on the
> field
> and trust the heuristics built in. If you insist on this be very sure you
> can
> afford the space.
>
> >    > Will it use filterCache when I sent a query with filter, eg:
> > fq=state:CA ??
>
> Don't know. Try it and look on admin/stats for the filter cache. You'll
> see a new insert if it does not use the one already there.
>
>
> >    > Once it is loaded, Do I need to sent a query with facet.method=enum
> > every time along with facet.field=state to get state related facet data
> > from filterCache?
>
> See above. You haven't told us how many docs in your index, so we
> have no way of estimating how much this'll cost you. Each entry
> will be maxDoc/8 roughly, and you'll have about 50 of them.
>
> Yes, though, if you take control of the facet.method you'll have to
> add it every time.
>
> >
> > 2) q=*:*&facet=true&facet.method=fc&facet.field=products  --> To load the
> > values related to products to fieldCache.
> >     > Again, while querying for this facet do I need to sent
> > facet.method=fc every time?
> See above.
>
> >
> > Thanks,
> > Manohar
> >
> > On Wed, Dec 24, 2014 at 11:36 AM, Erick Erickson <
> erickerickson@gmail.com>
> > wrote:
> >
> >> By and large, don't use the enum method unless there are _very_
> >> few unique values. It forms a filter (size roughly mixDoc/8 bytes)
> >> for _every_ unique value in the field, i.e. if you have 10,000 unique
> >> values it'll try to form 10,000 filterCache entries. Let the system
> >> do this for you automatically if appropriate.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Dec 23, 2014 at 9:37 PM, Manohar Sripada <ma...@gmail.com>
> >> wrote:
> >> > Thanks Erick and Toke,
> >> >
> >> > Also, I read here <
> https://wiki.apache.org/solr/SolrCaching#filterCache>
> >> that,
> >> > filterCache can also be used for faceting with facet.method=enum. So,
> I
> >> am
> >> > bit confused here on which one to use for faceting.
> >> >
> >> > One more thing here is I have different types of facets. (For example
> -
> >> > Product List, States). The Product List facet has lot many unique
> values
> >> > (around 10 million), where as States list will be in hundreds. So, I
> want
> >> > to come up with the numbers for size of fieldValueCache/filterCache
> and
> >> > pre-populate this.
> >> >
> >> > Thanks,
> >> > Manohar
> >> >
> >> > On Tue, Dec 23, 2014 at 10:07 PM, Erick Erickson <
> >> erickerickson@gmail.com>
> >> > wrote:
> >> >
> >> >> Or just not worry about it. The cache will be filled up automatically
> >> >> as you query for facets etc., the benefit to trying to fill it up as
> >> >> Toke outlines is just that the first few user queries that call for
> >> >> faceting will be somewhat faster. But after the first few user
> >> >> queries have gone through, it won't matter whether you've
> >> >> pre-loaded the cache or not.
> >> >>
> >> >> My point is that you'll get the benefit of the cache no matter what,
> >> >> it's just a matter of whether it's important that the first few users
> >> >> don't have to wait while they're loaded. And with DocValues,
> >> >> as Toke recommends, even that may be unimportant.
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen <
> te@statsbiblioteket.dk
> >> >
> >> >> wrote:
> >> >> > Manohar Sripada [manohar211@gmail.com] wrote:
> >> >> >> From the wiki, it states that
> >> >> >> http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly
> >> used
> >> >> for
> >> >> >> faceting.
> >> >> >
> >> >> >> Can someone please throw some light on how to load data to this
> >> cache.
> >> >> Like
> >> >> >> on what solrquery option does this consider the data to be loaded
> to
> >> >> this
> >> >> >> cache.
> >> >> >
> >> >> > The values are loaded on first facet call with facet.method=fc.
> >> >> > http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
> >> >> >
> >> >> >> My requirement is I have 10 facet fields (with facetlimit - 5) to
> be
> >> >> shown
> >> >> >> in my UI. I want to speed up this by using this cache. Is there a
> way
> >> >> where
> >> >> >> I can specify only the list of fields to be loaded to FieldValue
> >> Cache?
> >> >> >
> >> >> > Add a facet call as explicit warmup in your solrconfig.xml.
> >> >> >
> >> >> > You might want to consider DocValues for your facet fields.
> >> >> > https://cwiki.apache.org/confluence/display/solr/DocValues
> >> >> >
> >> >> > - Toke Eskildsen
> >> >>
> >>
>

Re: Loading data to FieldValueCache

Posted by Erick Erickson <er...@gmail.com>.
Inline.

On Tue, Dec 23, 2014 at 11:12 PM, Manohar Sripada <ma...@gmail.com> wrote:
> Okay. Let me try like this, as mine is a read-only index. I will have some
> queries in firstSearcher event listener
> 1) q=*:*&facet=true&facet.method=enum&facet.field=state   --> To load all
> the state related unique values to filterCache.

It's not necessary to use facet.method=enum here at all, just facet on the field
and trust the heuristics built in. If you insist on this be very sure you can
afford the space.

>    > Will it use filterCache when I sent a query with filter, eg:
> fq=state:CA ??

Don't know. Try it and look on admin/stats for the filter cache. You'll
see a new insert if it does not use the one already there.


>    > Once it is loaded, Do I need to sent a query with facet.method=enum
> every time along with facet.field=state to get state related facet data
> from filterCache?

See above. You haven't told us how many docs in your index, so we
have no way of estimating how much this'll cost you. Each entry
will be maxDoc/8 roughly, and you'll have about 50 of them.

Yes, though, if you take control of the facet.method you'll have to
add it every time.

>
> 2) q=*:*&facet=true&facet.method=fc&facet.field=products  --> To load the
> values related to products to fieldCache.
>     > Again, while querying for this facet do I need to sent
> facet.method=fc every time?
See above.

>
> Thanks,
> Manohar
>
> On Wed, Dec 24, 2014 at 11:36 AM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> By and large, don't use the enum method unless there are _very_
>> few unique values. It forms a filter (size roughly mixDoc/8 bytes)
>> for _every_ unique value in the field, i.e. if you have 10,000 unique
>> values it'll try to form 10,000 filterCache entries. Let the system
>> do this for you automatically if appropriate.
>>
>> Best,
>> Erick
>>
>> On Tue, Dec 23, 2014 at 9:37 PM, Manohar Sripada <ma...@gmail.com>
>> wrote:
>> > Thanks Erick and Toke,
>> >
>> > Also, I read here <https://wiki.apache.org/solr/SolrCaching#filterCache>
>> that,
>> > filterCache can also be used for faceting with facet.method=enum. So, I
>> am
>> > bit confused here on which one to use for faceting.
>> >
>> > One more thing here is I have different types of facets. (For example -
>> > Product List, States). The Product List facet has lot many unique values
>> > (around 10 million), where as States list will be in hundreds. So, I want
>> > to come up with the numbers for size of fieldValueCache/filterCache and
>> > pre-populate this.
>> >
>> > Thanks,
>> > Manohar
>> >
>> > On Tue, Dec 23, 2014 at 10:07 PM, Erick Erickson <
>> erickerickson@gmail.com>
>> > wrote:
>> >
>> >> Or just not worry about it. The cache will be filled up automatically
>> >> as you query for facets etc., the benefit to trying to fill it up as
>> >> Toke outlines is just that the first few user queries that call for
>> >> faceting will be somewhat faster. But after the first few user
>> >> queries have gone through, it won't matter whether you've
>> >> pre-loaded the cache or not.
>> >>
>> >> My point is that you'll get the benefit of the cache no matter what,
>> >> it's just a matter of whether it's important that the first few users
>> >> don't have to wait while they're loaded. And with DocValues,
>> >> as Toke recommends, even that may be unimportant.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen <te@statsbiblioteket.dk
>> >
>> >> wrote:
>> >> > Manohar Sripada [manohar211@gmail.com] wrote:
>> >> >> From the wiki, it states that
>> >> >> http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly
>> used
>> >> for
>> >> >> faceting.
>> >> >
>> >> >> Can someone please throw some light on how to load data to this
>> cache.
>> >> Like
>> >> >> on what solrquery option does this consider the data to be loaded to
>> >> this
>> >> >> cache.
>> >> >
>> >> > The values are loaded on first facet call with facet.method=fc.
>> >> > http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
>> >> >
>> >> >> My requirement is I have 10 facet fields (with facetlimit - 5) to be
>> >> shown
>> >> >> in my UI. I want to speed up this by using this cache. Is there a way
>> >> where
>> >> >> I can specify only the list of fields to be loaded to FieldValue
>> Cache?
>> >> >
>> >> > Add a facet call as explicit warmup in your solrconfig.xml.
>> >> >
>> >> > You might want to consider DocValues for your facet fields.
>> >> > https://cwiki.apache.org/confluence/display/solr/DocValues
>> >> >
>> >> > - Toke Eskildsen
>> >>
>>

Re: Loading data to FieldValueCache

Posted by Manohar Sripada <ma...@gmail.com>.
Okay. Let me try like this, as mine is a read-only index. I will have some
queries in firstSearcher event listener
1) q=*:*&facet=true&facet.method=enum&facet.field=state   --> To load all
the state related unique values to filterCache.
   > Will it use filterCache when I sent a query with filter, eg:
fq=state:CA ??
   > Once it is loaded, Do I need to sent a query with facet.method=enum
every time along with facet.field=state to get state related facet data
from filterCache?

2) q=*:*&facet=true&facet.method=fc&facet.field=products  --> To load the
values related to products to fieldCache.
    > Again, while querying for this facet do I need to sent
facet.method=fc every time?

Thanks,
Manohar

On Wed, Dec 24, 2014 at 11:36 AM, Erick Erickson <er...@gmail.com>
wrote:

> By and large, don't use the enum method unless there are _very_
> few unique values. It forms a filter (size roughly mixDoc/8 bytes)
> for _every_ unique value in the field, i.e. if you have 10,000 unique
> values it'll try to form 10,000 filterCache entries. Let the system
> do this for you automatically if appropriate.
>
> Best,
> Erick
>
> On Tue, Dec 23, 2014 at 9:37 PM, Manohar Sripada <ma...@gmail.com>
> wrote:
> > Thanks Erick and Toke,
> >
> > Also, I read here <https://wiki.apache.org/solr/SolrCaching#filterCache>
> that,
> > filterCache can also be used for faceting with facet.method=enum. So, I
> am
> > bit confused here on which one to use for faceting.
> >
> > One more thing here is I have different types of facets. (For example -
> > Product List, States). The Product List facet has lot many unique values
> > (around 10 million), where as States list will be in hundreds. So, I want
> > to come up with the numbers for size of fieldValueCache/filterCache and
> > pre-populate this.
> >
> > Thanks,
> > Manohar
> >
> > On Tue, Dec 23, 2014 at 10:07 PM, Erick Erickson <
> erickerickson@gmail.com>
> > wrote:
> >
> >> Or just not worry about it. The cache will be filled up automatically
> >> as you query for facets etc., the benefit to trying to fill it up as
> >> Toke outlines is just that the first few user queries that call for
> >> faceting will be somewhat faster. But after the first few user
> >> queries have gone through, it won't matter whether you've
> >> pre-loaded the cache or not.
> >>
> >> My point is that you'll get the benefit of the cache no matter what,
> >> it's just a matter of whether it's important that the first few users
> >> don't have to wait while they're loaded. And with DocValues,
> >> as Toke recommends, even that may be unimportant.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen <te@statsbiblioteket.dk
> >
> >> wrote:
> >> > Manohar Sripada [manohar211@gmail.com] wrote:
> >> >> From the wiki, it states that
> >> >> http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly
> used
> >> for
> >> >> faceting.
> >> >
> >> >> Can someone please throw some light on how to load data to this
> cache.
> >> Like
> >> >> on what solrquery option does this consider the data to be loaded to
> >> this
> >> >> cache.
> >> >
> >> > The values are loaded on first facet call with facet.method=fc.
> >> > http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
> >> >
> >> >> My requirement is I have 10 facet fields (with facetlimit - 5) to be
> >> shown
> >> >> in my UI. I want to speed up this by using this cache. Is there a way
> >> where
> >> >> I can specify only the list of fields to be loaded to FieldValue
> Cache?
> >> >
> >> > Add a facet call as explicit warmup in your solrconfig.xml.
> >> >
> >> > You might want to consider DocValues for your facet fields.
> >> > https://cwiki.apache.org/confluence/display/solr/DocValues
> >> >
> >> > - Toke Eskildsen
> >>
>

Re: Loading data to FieldValueCache

Posted by Erick Erickson <er...@gmail.com>.
By and large, don't use the enum method unless there are _very_
few unique values. It forms a filter (size roughly mixDoc/8 bytes)
for _every_ unique value in the field, i.e. if you have 10,000 unique
values it'll try to form 10,000 filterCache entries. Let the system
do this for you automatically if appropriate.

Best,
Erick

On Tue, Dec 23, 2014 at 9:37 PM, Manohar Sripada <ma...@gmail.com> wrote:
> Thanks Erick and Toke,
>
> Also, I read here <https://wiki.apache.org/solr/SolrCaching#filterCache> that,
> filterCache can also be used for faceting with facet.method=enum. So, I am
> bit confused here on which one to use for faceting.
>
> One more thing here is I have different types of facets. (For example -
> Product List, States). The Product List facet has lot many unique values
> (around 10 million), where as States list will be in hundreds. So, I want
> to come up with the numbers for size of fieldValueCache/filterCache and
> pre-populate this.
>
> Thanks,
> Manohar
>
> On Tue, Dec 23, 2014 at 10:07 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Or just not worry about it. The cache will be filled up automatically
>> as you query for facets etc., the benefit to trying to fill it up as
>> Toke outlines is just that the first few user queries that call for
>> faceting will be somewhat faster. But after the first few user
>> queries have gone through, it won't matter whether you've
>> pre-loaded the cache or not.
>>
>> My point is that you'll get the benefit of the cache no matter what,
>> it's just a matter of whether it's important that the first few users
>> don't have to wait while they're loaded. And with DocValues,
>> as Toke recommends, even that may be unimportant.
>>
>> Best,
>> Erick
>>
>> On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
>> wrote:
>> > Manohar Sripada [manohar211@gmail.com] wrote:
>> >> From the wiki, it states that
>> >> http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used
>> for
>> >> faceting.
>> >
>> >> Can someone please throw some light on how to load data to this cache.
>> Like
>> >> on what solrquery option does this consider the data to be loaded to
>> this
>> >> cache.
>> >
>> > The values are loaded on first facet call with facet.method=fc.
>> > http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
>> >
>> >> My requirement is I have 10 facet fields (with facetlimit - 5) to be
>> shown
>> >> in my UI. I want to speed up this by using this cache. Is there a way
>> where
>> >> I can specify only the list of fields to be loaded to FieldValue Cache?
>> >
>> > Add a facet call as explicit warmup in your solrconfig.xml.
>> >
>> > You might want to consider DocValues for your facet fields.
>> > https://cwiki.apache.org/confluence/display/solr/DocValues
>> >
>> > - Toke Eskildsen
>>

Re: Loading data to FieldValueCache

Posted by Manohar Sripada <ma...@gmail.com>.
Thanks Erick and Toke,

Also, I read here <https://wiki.apache.org/solr/SolrCaching#filterCache> that,
filterCache can also be used for faceting with facet.method=enum. So, I am
bit confused here on which one to use for faceting.

One more thing here is I have different types of facets. (For example -
Product List, States). The Product List facet has lot many unique values
(around 10 million), where as States list will be in hundreds. So, I want
to come up with the numbers for size of fieldValueCache/filterCache and
pre-populate this.

Thanks,
Manohar

On Tue, Dec 23, 2014 at 10:07 PM, Erick Erickson <er...@gmail.com>
wrote:

> Or just not worry about it. The cache will be filled up automatically
> as you query for facets etc., the benefit to trying to fill it up as
> Toke outlines is just that the first few user queries that call for
> faceting will be somewhat faster. But after the first few user
> queries have gone through, it won't matter whether you've
> pre-loaded the cache or not.
>
> My point is that you'll get the benefit of the cache no matter what,
> it's just a matter of whether it's important that the first few users
> don't have to wait while they're loaded. And with DocValues,
> as Toke recommends, even that may be unimportant.
>
> Best,
> Erick
>
> On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
> wrote:
> > Manohar Sripada [manohar211@gmail.com] wrote:
> >> From the wiki, it states that
> >> http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used
> for
> >> faceting.
> >
> >> Can someone please throw some light on how to load data to this cache.
> Like
> >> on what solrquery option does this consider the data to be loaded to
> this
> >> cache.
> >
> > The values are loaded on first facet call with facet.method=fc.
> > http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
> >
> >> My requirement is I have 10 facet fields (with facetlimit - 5) to be
> shown
> >> in my UI. I want to speed up this by using this cache. Is there a way
> where
> >> I can specify only the list of fields to be loaded to FieldValue Cache?
> >
> > Add a facet call as explicit warmup in your solrconfig.xml.
> >
> > You might want to consider DocValues for your facet fields.
> > https://cwiki.apache.org/confluence/display/solr/DocValues
> >
> > - Toke Eskildsen
>

Re: Loading data to FieldValueCache

Posted by Erick Erickson <er...@gmail.com>.
Or just not worry about it. The cache will be filled up automatically
as you query for facets etc., the benefit to trying to fill it up as
Toke outlines is just that the first few user queries that call for
faceting will be somewhat faster. But after the first few user
queries have gone through, it won't matter whether you've
pre-loaded the cache or not.

My point is that you'll get the benefit of the cache no matter what,
it's just a matter of whether it's important that the first few users
don't have to wait while they're loaded. And with DocValues,
as Toke recommends, even that may be unimportant.

Best,
Erick

On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:
> Manohar Sripada [manohar211@gmail.com] wrote:
>> From the wiki, it states that
>> http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used for
>> faceting.
>
>> Can someone please throw some light on how to load data to this cache. Like
>> on what solrquery option does this consider the data to be loaded to this
>> cache.
>
> The values are loaded on first facet call with facet.method=fc.
> http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
>
>> My requirement is I have 10 facet fields (with facetlimit - 5) to be shown
>> in my UI. I want to speed up this by using this cache. Is there a way where
>> I can specify only the list of fields to be loaded to FieldValue Cache?
>
> Add a facet call as explicit warmup in your solrconfig.xml.
>
> You might want to consider DocValues for your facet fields.
> https://cwiki.apache.org/confluence/display/solr/DocValues
>
> - Toke Eskildsen

RE: Loading data to FieldValueCache

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
Manohar Sripada [manohar211@gmail.com] wrote:
> From the wiki, it states that
> http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used for
> faceting.

> Can someone please throw some light on how to load data to this cache. Like
> on what solrquery option does this consider the data to be loaded to this
> cache.

The values are loaded on first facet call with facet.method=fc.
http://wiki.apache.org/solr/SimpleFacetParameters#facet.method

> My requirement is I have 10 facet fields (with facetlimit - 5) to be shown
> in my UI. I want to speed up this by using this cache. Is there a way where
> I can specify only the list of fields to be loaded to FieldValue Cache?

Add a facet call as explicit warmup in your solrconfig.xml.

You might want to consider DocValues for your facet fields.
https://cwiki.apache.org/confluence/display/solr/DocValues

- Toke Eskildsen