You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Daniel Angelov <da...@gmail.com> on 2017/06/02 05:40:32 UTC

Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

Is the filter cache separate for each host and then for each collection and
then for each shard and then for each replica in SolrCloud?
For example, on host1 we have, coll1 shard1 replica1 and coll2 shard1
replica1, on host2 we have, coll1 shard2 replica2 and coll2 shard2
replica2. Does this mean, that we have 4 filter caches, i.e. separate
memory for each core?
If they are separated and for example, query1 is handling from coll1 shard1
replica1 and 1 sec later the same query is handling from coll2 shard1
replica1, this means, that the later query will not use the result set
cached from the first query...

BR
Daniel

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

Posted by Erick Erickson <er...@gmail.com>.

bq: fq value, say 200000 char....

Well, my guess here is that you're constructing a huge OR clause
(that's the usual case for such large fq clauses).

It's rare for such a clause to be generated identically very often. Do
you really expect to have this _exact_ clause created over and over
and over and over? Even having one character different in it (even
different orders, i.e. a clause like fq=id:(a OR b) will not be reused
for fq=id:(b OR a)).

So consider using the TermsQParserPlugin and set cache false for the fq clause.

Best,
Erick



On Fri, Jun 2, 2017 at 1:26 PM, Daniel Angelov <da...@gmail.com> wrote:
> In this case, for example:
> http://host1:8983/solr/collName/admin/mbeans?stats=true
> returns us stats in the contex of the shard of "collName", living on host1,
> is not it?
>
> BR
> Daniel
>
> Am 02.06.2017 20:00 schrieb "Daniel Angelov" <da...@gmail.com>:
>
> Sorry for the typos in the previous mail, "fg" should be "fq"
>
> Am 02.06.2017 18:15 schrieb "Daniel Angelov" <da...@gmail.com>:
>
>> This means, that quering alias NNN pointing 3 collections, each 10 shards
>> and each 2 replicas, a query with very long fg value, say 200000 char
>> string. First query with fq will cache all 200000 chars 30 times (3 x 10
>> cores). The next query with the same fg, could not use the same cores as
>> the first time, i.e. could locate more mem in the unused replicas from the
>> first query. And in my case the soft commint is each 60 sec. this means a
>> lot of GC, is not it?
>>
>> BR
>> Daniel
>>
>> Am 02.06.2017 17:45 schrieb "Erick Erickson" <er...@gmail.com>:
>>
>>> bq: This means, if we have a collection with 2 replicas, there is a
>>> chance,
>>> that 2 queries with identical fq values can be served from different
>>> replicas of the same shards, this means, that the second query will not
>>> use
>>> the cached set from the first query, is not it?
>>>
>>> Yes. In practice autowarming is often used to pre-warm the caches, but
>>> again that's local to each replica, i.e. the fqs used to autowarm
>>> replica1 or shard1 may be different than the ones used to autowarm
>>> replica2 of shard1. What tends to happen is that the replicas "level
>>> out". Any fq clause that's common enough to be useful eventually hits
>>> all the replicas. And the most common ones are run during autowarming
>>> since it's an LRU queue.
>>>
>>> To understand why there isn't a common cache, consider that the
>>> filterCache is conceptually a map. The key is the fq clause and the
>>> value is a bitset where each bit corresponds to the _internal_ Lucene
>>> document ID which is just an integer 0-maxDoc. There are two critical
>>> points here:
>>>
>>> 1> the internal ID changes when segments are merged
>>> 2> different replicas will have different _internal_ ids for the same
>>> document. By "same" here I mean have the same <uniqueKey>.
>>>
>>> So completely sidestepping the question of the propagation delays of
>>> trying to consult some kind of central filterCache, the nature of that
>>> cache is such that you couldn't share it between replicas anyway.
>>>
>>> Best,
>>> Erick
>>>
>>> On Fri, Jun 2, 2017 at 8:31 AM, Daniel Angelov <da...@gmail.com>
>>> wrote:
>>> > Thanks for the answer!
>>> > This means, if we have a collection with 2 replicas, there is a chance,
>>> > that 2 queries with identical fq values can be served from different
>>> > replicas of the same shards, this means, that the second query will not
>>> use
>>> > the cached set from the first query, is not it?
>>> >
>>> > Thanks
>>> > Daniel
>>> >
>>> > Am 02.06.2017 15:32 schrieb "Susheel Kumar" <su...@gmail.com>:
>>> >
>>> >> Thanks for the correction Shawn.  Yes its only the heap allocation
>>> settings
>>> >> are per host/JVM.
>>> >>
>>> >> On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey <ap...@elyograg.org>
>>> wrote:
>>> >>
>>> >> > On 6/1/2017 11:40 PM, Daniel Angelov wrote:
>>> >> > > Is the filter cache separate for each host and then for each
>>> >> > > collection and then for each shard and then for each replica in
>>> >> > > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and
>>> >> > > coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and
>>> >> > > coll2 shard2 replica2. Does this mean, that we have 4 filter
>>> caches,
>>> >> > > i.e. separate memory for each core? If they are separated and for
>>> >> > > example, query1 is handling from coll1 shard1 replica1 and 1 sec
>>> later
>>> >> > > the same query is handling from coll2 shard1 replica1, this means,
>>> >> > > that the later query will not use the result set cached from the
>>> first
>>> >> > > query...
>>> >> >
>>> >> > That is correct.
>>> >> >
>>> >> > General notes about SolrCloud terminology: SolrCloud is organized
>>> around
>>> >> > collections.  Collections are made up of one or more shards.  Shards
>>> are
>>> >> > made up of one or more replicas.  Each replica is a Solr core.  A
>>> core
>>> >> > contains one Lucene index.  It is not correct to say that a shard
>>> has no
>>> >> > replicas.  The leader *is* a replica.  If you have a leader and one
>>> >> > follower, the shard has two replicas.
>>> >> >
>>> >> > Solr caches (including filterCache) exist at the core level, they
>>> have
>>> >> > no knowledge of other replicas, other shards, or the collection as a
>>> >> > whole.  Susheel says that the caches are per host/JVM -- that's not
>>> >> > correct.  Every Solr core in a JVM has separate caches, if they are
>>> >> > defined in the configuration for that core.
>>> >> >
>>> >> > Your query scenario has even more separation -- it asks about
>>> querying
>>> >> > two completely different collections, which don't use the same cores.
>>> >> >
>>> >> > Thanks,
>>> >> > Shawn
>>> >> >
>>> >> >
>>> >>
>>>
>>

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

Posted by Daniel Angelov <da...@gmail.com>.

In this case, for example:
http://host1:8983/solr/collName/admin/mbeans?stats=true
returns us stats in the contex of the shard of "collName", living on host1,
is not it?

BR
Daniel

Am 02.06.2017 20:00 schrieb "Daniel Angelov" <da...@gmail.com>:

Sorry for the typos in the previous mail, "fg" should be "fq"

Am 02.06.2017 18:15 schrieb "Daniel Angelov" <da...@gmail.com>:

> This means, that quering alias NNN pointing 3 collections, each 10 shards
> and each 2 replicas, a query with very long fg value, say 200000 char
> string. First query with fq will cache all 200000 chars 30 times (3 x 10
> cores). The next query with the same fg, could not use the same cores as
> the first time, i.e. could locate more mem in the unused replicas from the
> first query. And in my case the soft commint is each 60 sec. this means a
> lot of GC, is not it?
>
> BR
> Daniel
>
> Am 02.06.2017 17:45 schrieb "Erick Erickson" <er...@gmail.com>:
>
>> bq: This means, if we have a collection with 2 replicas, there is a
>> chance,
>> that 2 queries with identical fq values can be served from different
>> replicas of the same shards, this means, that the second query will not
>> use
>> the cached set from the first query, is not it?
>>
>> Yes. In practice autowarming is often used to pre-warm the caches, but
>> again that's local to each replica, i.e. the fqs used to autowarm
>> replica1 or shard1 may be different than the ones used to autowarm
>> replica2 of shard1. What tends to happen is that the replicas "level
>> out". Any fq clause that's common enough to be useful eventually hits
>> all the replicas. And the most common ones are run during autowarming
>> since it's an LRU queue.
>>
>> To understand why there isn't a common cache, consider that the
>> filterCache is conceptually a map. The key is the fq clause and the
>> value is a bitset where each bit corresponds to the _internal_ Lucene
>> document ID which is just an integer 0-maxDoc. There are two critical
>> points here:
>>
>> 1> the internal ID changes when segments are merged
>> 2> different replicas will have different _internal_ ids for the same
>> document. By "same" here I mean have the same <uniqueKey>.
>>
>> So completely sidestepping the question of the propagation delays of
>> trying to consult some kind of central filterCache, the nature of that
>> cache is such that you couldn't share it between replicas anyway.
>>
>> Best,
>> Erick
>>
>> On Fri, Jun 2, 2017 at 8:31 AM, Daniel Angelov <da...@gmail.com>
>> wrote:
>> > Thanks for the answer!
>> > This means, if we have a collection with 2 replicas, there is a chance,
>> > that 2 queries with identical fq values can be served from different
>> > replicas of the same shards, this means, that the second query will not
>> use
>> > the cached set from the first query, is not it?
>> >
>> > Thanks
>> > Daniel
>> >
>> > Am 02.06.2017 15:32 schrieb "Susheel Kumar" <su...@gmail.com>:
>> >
>> >> Thanks for the correction Shawn.  Yes its only the heap allocation
>> settings
>> >> are per host/JVM.
>> >>
>> >> On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey <ap...@elyograg.org>
>> wrote:
>> >>
>> >> > On 6/1/2017 11:40 PM, Daniel Angelov wrote:
>> >> > > Is the filter cache separate for each host and then for each
>> >> > > collection and then for each shard and then for each replica in
>> >> > > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and
>> >> > > coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and
>> >> > > coll2 shard2 replica2. Does this mean, that we have 4 filter
>> caches,
>> >> > > i.e. separate memory for each core? If they are separated and for
>> >> > > example, query1 is handling from coll1 shard1 replica1 and 1 sec
>> later
>> >> > > the same query is handling from coll2 shard1 replica1, this means,
>> >> > > that the later query will not use the result set cached from the
>> first
>> >> > > query...
>> >> >
>> >> > That is correct.
>> >> >
>> >> > General notes about SolrCloud terminology: SolrCloud is organized
>> around
>> >> > collections.  Collections are made up of one or more shards.  Shards
>> are
>> >> > made up of one or more replicas.  Each replica is a Solr core.  A
>> core
>> >> > contains one Lucene index.  It is not correct to say that a shard
>> has no
>> >> > replicas.  The leader *is* a replica.  If you have a leader and one
>> >> > follower, the shard has two replicas.
>> >> >
>> >> > Solr caches (including filterCache) exist at the core level, they
>> have
>> >> > no knowledge of other replicas, other shards, or the collection as a
>> >> > whole.  Susheel says that the caches are per host/JVM -- that's not
>> >> > correct.  Every Solr core in a JVM has separate caches, if they are
>> >> > defined in the configuration for that core.
>> >> >
>> >> > Your query scenario has even more separation -- it asks about
>> querying
>> >> > two completely different collections, which don't use the same cores.
>> >> >
>> >> > Thanks,
>> >> > Shawn
>> >> >
>> >> >
>> >>
>>
>

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

Posted by Daniel Angelov <da...@gmail.com>.

Sorry for the typos in the previous mail, "fg" should be "fq"

Am 02.06.2017 18:15 schrieb "Daniel Angelov" <da...@gmail.com>:

> This means, that quering alias NNN pointing 3 collections, each 10 shards
> and each 2 replicas, a query with very long fg value, say 200000 char
> string. First query with fq will cache all 200000 chars 30 times (3 x 10
> cores). The next query with the same fg, could not use the same cores as
> the first time, i.e. could locate more mem in the unused replicas from the
> first query. And in my case the soft commint is each 60 sec. this means a
> lot of GC, is not it?
>
> BR
> Daniel
>
> Am 02.06.2017 17:45 schrieb "Erick Erickson" <er...@gmail.com>:
>
>> bq: This means, if we have a collection with 2 replicas, there is a
>> chance,
>> that 2 queries with identical fq values can be served from different
>> replicas of the same shards, this means, that the second query will not
>> use
>> the cached set from the first query, is not it?
>>
>> Yes. In practice autowarming is often used to pre-warm the caches, but
>> again that's local to each replica, i.e. the fqs used to autowarm
>> replica1 or shard1 may be different than the ones used to autowarm
>> replica2 of shard1. What tends to happen is that the replicas "level
>> out". Any fq clause that's common enough to be useful eventually hits
>> all the replicas. And the most common ones are run during autowarming
>> since it's an LRU queue.
>>
>> To understand why there isn't a common cache, consider that the
>> filterCache is conceptually a map. The key is the fq clause and the
>> value is a bitset where each bit corresponds to the _internal_ Lucene
>> document ID which is just an integer 0-maxDoc. There are two critical
>> points here:
>>
>> 1> the internal ID changes when segments are merged
>> 2> different replicas will have different _internal_ ids for the same
>> document. By "same" here I mean have the same <uniqueKey>.
>>
>> So completely sidestepping the question of the propagation delays of
>> trying to consult some kind of central filterCache, the nature of that
>> cache is such that you couldn't share it between replicas anyway.
>>
>> Best,
>> Erick
>>
>> On Fri, Jun 2, 2017 at 8:31 AM, Daniel Angelov <da...@gmail.com>
>> wrote:
>> > Thanks for the answer!
>> > This means, if we have a collection with 2 replicas, there is a chance,
>> > that 2 queries with identical fq values can be served from different
>> > replicas of the same shards, this means, that the second query will not
>> use
>> > the cached set from the first query, is not it?
>> >
>> > Thanks
>> > Daniel
>> >
>> > Am 02.06.2017 15:32 schrieb "Susheel Kumar" <su...@gmail.com>:
>> >
>> >> Thanks for the correction Shawn.  Yes its only the heap allocation
>> settings
>> >> are per host/JVM.
>> >>
>> >> On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey <ap...@elyograg.org>
>> wrote:
>> >>
>> >> > On 6/1/2017 11:40 PM, Daniel Angelov wrote:
>> >> > > Is the filter cache separate for each host and then for each
>> >> > > collection and then for each shard and then for each replica in
>> >> > > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and
>> >> > > coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and
>> >> > > coll2 shard2 replica2. Does this mean, that we have 4 filter
>> caches,
>> >> > > i.e. separate memory for each core? If they are separated and for
>> >> > > example, query1 is handling from coll1 shard1 replica1 and 1 sec
>> later
>> >> > > the same query is handling from coll2 shard1 replica1, this means,
>> >> > > that the later query will not use the result set cached from the
>> first
>> >> > > query...
>> >> >
>> >> > That is correct.
>> >> >
>> >> > General notes about SolrCloud terminology: SolrCloud is organized
>> around
>> >> > collections.  Collections are made up of one or more shards.  Shards
>> are
>> >> > made up of one or more replicas.  Each replica is a Solr core.  A
>> core
>> >> > contains one Lucene index.  It is not correct to say that a shard
>> has no
>> >> > replicas.  The leader *is* a replica.  If you have a leader and one
>> >> > follower, the shard has two replicas.
>> >> >
>> >> > Solr caches (including filterCache) exist at the core level, they
>> have
>> >> > no knowledge of other replicas, other shards, or the collection as a
>> >> > whole.  Susheel says that the caches are per host/JVM -- that's not
>> >> > correct.  Every Solr core in a JVM has separate caches, if they are
>> >> > defined in the configuration for that core.
>> >> >
>> >> > Your query scenario has even more separation -- it asks about
>> querying
>> >> > two completely different collections, which don't use the same cores.
>> >> >
>> >> > Thanks,
>> >> > Shawn
>> >> >
>> >> >
>> >>
>>
>

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

Posted by Daniel Angelov <da...@gmail.com>.

This means, that quering alias NNN pointing 3 collections, each 10 shards
and each 2 replicas, a query with very long fg value, say 200000 char
string. First query with fq will cache all 200000 chars 30 times (3 x 10
cores). The next query with the same fg, could not use the same cores as
the first time, i.e. could locate more mem in the unused replicas from the
first query. And in my case the soft commint is each 60 sec. this means a
lot of GC, is not it?

BR
Daniel

Am 02.06.2017 17:45 schrieb "Erick Erickson" <er...@gmail.com>:

> bq: This means, if we have a collection with 2 replicas, there is a chance,
> that 2 queries with identical fq values can be served from different
> replicas of the same shards, this means, that the second query will not use
> the cached set from the first query, is not it?
>
> Yes. In practice autowarming is often used to pre-warm the caches, but
> again that's local to each replica, i.e. the fqs used to autowarm
> replica1 or shard1 may be different than the ones used to autowarm
> replica2 of shard1. What tends to happen is that the replicas "level
> out". Any fq clause that's common enough to be useful eventually hits
> all the replicas. And the most common ones are run during autowarming
> since it's an LRU queue.
>
> To understand why there isn't a common cache, consider that the
> filterCache is conceptually a map. The key is the fq clause and the
> value is a bitset where each bit corresponds to the _internal_ Lucene
> document ID which is just an integer 0-maxDoc. There are two critical
> points here:
>
> 1> the internal ID changes when segments are merged
> 2> different replicas will have different _internal_ ids for the same
> document. By "same" here I mean have the same <uniqueKey>.
>
> So completely sidestepping the question of the propagation delays of
> trying to consult some kind of central filterCache, the nature of that
> cache is such that you couldn't share it between replicas anyway.
>
> Best,
> Erick
>
> On Fri, Jun 2, 2017 at 8:31 AM, Daniel Angelov <da...@gmail.com>
> wrote:
> > Thanks for the answer!
> > This means, if we have a collection with 2 replicas, there is a chance,
> > that 2 queries with identical fq values can be served from different
> > replicas of the same shards, this means, that the second query will not
> use
> > the cached set from the first query, is not it?
> >
> > Thanks
> > Daniel
> >
> > Am 02.06.2017 15:32 schrieb "Susheel Kumar" <su...@gmail.com>:
> >
> >> Thanks for the correction Shawn.  Yes its only the heap allocation
> settings
> >> are per host/JVM.
> >>
> >> On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey <ap...@elyograg.org>
> wrote:
> >>
> >> > On 6/1/2017 11:40 PM, Daniel Angelov wrote:
> >> > > Is the filter cache separate for each host and then for each
> >> > > collection and then for each shard and then for each replica in
> >> > > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and
> >> > > coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and
> >> > > coll2 shard2 replica2. Does this mean, that we have 4 filter caches,
> >> > > i.e. separate memory for each core? If they are separated and for
> >> > > example, query1 is handling from coll1 shard1 replica1 and 1 sec
> later
> >> > > the same query is handling from coll2 shard1 replica1, this means,
> >> > > that the later query will not use the result set cached from the
> first
> >> > > query...
> >> >
> >> > That is correct.
> >> >
> >> > General notes about SolrCloud terminology: SolrCloud is organized
> around
> >> > collections.  Collections are made up of one or more shards.  Shards
> are
> >> > made up of one or more replicas.  Each replica is a Solr core.  A core
> >> > contains one Lucene index.  It is not correct to say that a shard has
> no
> >> > replicas.  The leader *is* a replica.  If you have a leader and one
> >> > follower, the shard has two replicas.
> >> >
> >> > Solr caches (including filterCache) exist at the core level, they have
> >> > no knowledge of other replicas, other shards, or the collection as a
> >> > whole.  Susheel says that the caches are per host/JVM -- that's not
> >> > correct.  Every Solr core in a JVM has separate caches, if they are
> >> > defined in the configuration for that core.
> >> >
> >> > Your query scenario has even more separation -- it asks about querying
> >> > two completely different collections, which don't use the same cores.
> >> >
> >> > Thanks,
> >> > Shawn
> >> >
> >> >
> >>
>

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

Posted by Erick Erickson <er...@gmail.com>.

bq: This means, if we have a collection with 2 replicas, there is a chance,
that 2 queries with identical fq values can be served from different
replicas of the same shards, this means, that the second query will not use
the cached set from the first query, is not it?

Yes. In practice autowarming is often used to pre-warm the caches, but
again that's local to each replica, i.e. the fqs used to autowarm
replica1 or shard1 may be different than the ones used to autowarm
replica2 of shard1. What tends to happen is that the replicas "level
out". Any fq clause that's common enough to be useful eventually hits
all the replicas. And the most common ones are run during autowarming
since it's an LRU queue.

To understand why there isn't a common cache, consider that the
filterCache is conceptually a map. The key is the fq clause and the
value is a bitset where each bit corresponds to the _internal_ Lucene
document ID which is just an integer 0-maxDoc. There are two critical
points here:

1> the internal ID changes when segments are merged
2> different replicas will have different _internal_ ids for the same
document. By "same" here I mean have the same <uniqueKey>.

So completely sidestepping the question of the propagation delays of
trying to consult some kind of central filterCache, the nature of that
cache is such that you couldn't share it between replicas anyway.

Best,
Erick

On Fri, Jun 2, 2017 at 8:31 AM, Daniel Angelov <da...@gmail.com> wrote:
> Thanks for the answer!
> This means, if we have a collection with 2 replicas, there is a chance,
> that 2 queries with identical fq values can be served from different
> replicas of the same shards, this means, that the second query will not use
> the cached set from the first query, is not it?
>
> Thanks
> Daniel
>
> Am 02.06.2017 15:32 schrieb "Susheel Kumar" <su...@gmail.com>:
>
>> Thanks for the correction Shawn.  Yes its only the heap allocation settings
>> are per host/JVM.
>>
>> On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>>
>> > On 6/1/2017 11:40 PM, Daniel Angelov wrote:
>> > > Is the filter cache separate for each host and then for each
>> > > collection and then for each shard and then for each replica in
>> > > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and
>> > > coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and
>> > > coll2 shard2 replica2. Does this mean, that we have 4 filter caches,
>> > > i.e. separate memory for each core? If they are separated and for
>> > > example, query1 is handling from coll1 shard1 replica1 and 1 sec later
>> > > the same query is handling from coll2 shard1 replica1, this means,
>> > > that the later query will not use the result set cached from the first
>> > > query...
>> >
>> > That is correct.
>> >
>> > General notes about SolrCloud terminology: SolrCloud is organized around
>> > collections.  Collections are made up of one or more shards.  Shards are
>> > made up of one or more replicas.  Each replica is a Solr core.  A core
>> > contains one Lucene index.  It is not correct to say that a shard has no
>> > replicas.  The leader *is* a replica.  If you have a leader and one
>> > follower, the shard has two replicas.
>> >
>> > Solr caches (including filterCache) exist at the core level, they have
>> > no knowledge of other replicas, other shards, or the collection as a
>> > whole.  Susheel says that the caches are per host/JVM -- that's not
>> > correct.  Every Solr core in a JVM has separate caches, if they are
>> > defined in the configuration for that core.
>> >
>> > Your query scenario has even more separation -- it asks about querying
>> > two completely different collections, which don't use the same cores.
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>>

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

Posted by Daniel Angelov <da...@gmail.com>.

Thanks for the answer!
This means, if we have a collection with 2 replicas, there is a chance,
that 2 queries with identical fq values can be served from different
replicas of the same shards, this means, that the second query will not use
the cached set from the first query, is not it?

Thanks
Daniel

Am 02.06.2017 15:32 schrieb "Susheel Kumar" <su...@gmail.com>:

> Thanks for the correction Shawn.  Yes its only the heap allocation settings
> are per host/JVM.
>
> On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>
> > On 6/1/2017 11:40 PM, Daniel Angelov wrote:
> > > Is the filter cache separate for each host and then for each
> > > collection and then for each shard and then for each replica in
> > > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and
> > > coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and
> > > coll2 shard2 replica2. Does this mean, that we have 4 filter caches,
> > > i.e. separate memory for each core? If they are separated and for
> > > example, query1 is handling from coll1 shard1 replica1 and 1 sec later
> > > the same query is handling from coll2 shard1 replica1, this means,
> > > that the later query will not use the result set cached from the first
> > > query...
> >
> > That is correct.
> >
> > General notes about SolrCloud terminology: SolrCloud is organized around
> > collections.  Collections are made up of one or more shards.  Shards are
> > made up of one or more replicas.  Each replica is a Solr core.  A core
> > contains one Lucene index.  It is not correct to say that a shard has no
> > replicas.  The leader *is* a replica.  If you have a leader and one
> > follower, the shard has two replicas.
> >
> > Solr caches (including filterCache) exist at the core level, they have
> > no knowledge of other replicas, other shards, or the collection as a
> > whole.  Susheel says that the caches are per host/JVM -- that's not
> > correct.  Every Solr core in a JVM has separate caches, if they are
> > defined in the configuration for that core.
> >
> > Your query scenario has even more separation -- it asks about querying
> > two completely different collections, which don't use the same cores.
> >
> > Thanks,
> > Shawn
> >
> >
>

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

Posted by Susheel Kumar <su...@gmail.com>.

Thanks for the correction Shawn.  Yes its only the heap allocation settings
are per host/JVM.

On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 6/1/2017 11:40 PM, Daniel Angelov wrote:
> > Is the filter cache separate for each host and then for each
> > collection and then for each shard and then for each replica in
> > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and
> > coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and
> > coll2 shard2 replica2. Does this mean, that we have 4 filter caches,
> > i.e. separate memory for each core? If they are separated and for
> > example, query1 is handling from coll1 shard1 replica1 and 1 sec later
> > the same query is handling from coll2 shard1 replica1, this means,
> > that the later query will not use the result set cached from the first
> > query...
>
> That is correct.
>
> General notes about SolrCloud terminology: SolrCloud is organized around
> collections.  Collections are made up of one or more shards.  Shards are
> made up of one or more replicas.  Each replica is a Solr core.  A core
> contains one Lucene index.  It is not correct to say that a shard has no
> replicas.  The leader *is* a replica.  If you have a leader and one
> follower, the shard has two replicas.
>
> Solr caches (including filterCache) exist at the core level, they have
> no knowledge of other replicas, other shards, or the collection as a
> whole.  Susheel says that the caches are per host/JVM -- that's not
> correct.  Every Solr core in a JVM has separate caches, if they are
> defined in the configuration for that core.
>
> Your query scenario has even more separation -- it asks about querying
> two completely different collections, which don't use the same cores.
>
> Thanks,
> Shawn
>
>

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

Posted by Shawn Heisey <ap...@elyograg.org>.

On 6/1/2017 11:40 PM, Daniel Angelov wrote:
> Is the filter cache separate for each host and then for each
> collection and then for each shard and then for each replica in
> SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and
> coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and
> coll2 shard2 replica2. Does this mean, that we have 4 filter caches,
> i.e. separate memory for each core? If they are separated and for
> example, query1 is handling from coll1 shard1 replica1 and 1 sec later
> the same query is handling from coll2 shard1 replica1, this means,
> that the later query will not use the result set cached from the first
> query... 

That is correct.

General notes about SolrCloud terminology: SolrCloud is organized around
collections.  Collections are made up of one or more shards.  Shards are
made up of one or more replicas.  Each replica is a Solr core.  A core
contains one Lucene index.  It is not correct to say that a shard has no
replicas.  The leader *is* a replica.  If you have a leader and one
follower, the shard has two replicas.

Solr caches (including filterCache) exist at the core level, they have
no knowledge of other replicas, other shards, or the collection as a
whole.  Susheel says that the caches are per host/JVM -- that's not
correct.  Every Solr core in a JVM has separate caches, if they are
defined in the configuration for that core.

Your query scenario has even more separation -- it asks about querying
two completely different collections, which don't use the same cores.

Thanks,
Shawn

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

Posted by Susheel Kumar <su...@gmail.com>.

The heap allocation and cache settings are per host/JVM not for each
collection / shards. In SolrCloud you execute queries against a collection
and every other collection may have different schema/document id's and
all.  So answer to your question, query1 from coll1 can't use results
cached from query against coll2.

Thnx

On Fri, Jun 2, 2017 at 1:40 AM, Daniel Angelov <da...@gmail.com>
wrote:

> Is the filter cache separate for each host and then for each collection and
> then for each shard and then for each replica in SolrCloud?
> For example, on host1 we have, coll1 shard1 replica1 and coll2 shard1
> replica1, on host2 we have, coll1 shard2 replica2 and coll2 shard2
> replica2. Does this mean, that we have 4 filter caches, i.e. separate
> memory for each core?
> If they are separated and for example, query1 is handling from coll1 shard1
> replica1 and 1 sec later the same query is handling from coll2 shard1
> replica1, this means, that the later query will not use the result set
> cached from the first query...
>
> BR
> Daniel
>