You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "S.L" <si...@gmail.com> on 2014/10/02 23:51:33 UTC

SolrCloud 4.7 not doing distributed search when querying from a load balancer.

Hi All,

I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
replication factor of 2 .

I have fronted these 6 Solr nodes using a load balancer , what I notice is
that every time I do a search of the form
q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a result
only once in every 3 tries , telling me that the load balancer is
distributing the requests between the 3 shards and SolrCloud only returns a
result if the request goes to the core that as that id .

However if I do a simple search like q=*:* , I consistently get the right
aggregated results back of all the documents across all the shards for
every request from the load balancer. Can someone please let me know what
this is symptomatic of ?

Somehow Solr Cloud seems to be doing search query distribution and
aggregation for queries of type *:* only.

Thanks.

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

Posted by Erick Erickson <er...@gmail.com>.
Hmmmm. Assuming that you aren't re-indexing the doc you're searching for...

Try issuing http://blah blah:8983/solr/collection/update?commit=true.
That'll force all the docs to be searchable. Does <1> still hold for
the document in question? Because this is exactly backwards of what
I'd expect. I'd expect, if anything, the replica (I'm trying to call
it the "follower" when a distinction needs to be made since the leader
is a "replica" too....) would be out of sync. This is still a Bad
Thing, but the leader gets first crack at indexing thing.

bq: only the replica of the shard that has this key returns the result
, and the leader does not ,

Just to be sure we're talking about the same thing. When you say
"leader", you mean the shard leader, right? The filled-in circle on
the graph view from the admin/cloud page.

And let's see your soft and hard commit settings please.

Best,
Erick

On Thu, Oct 2, 2014 at 9:48 PM, S.L <si...@gmail.com> wrote:
> Eirck,
>
> 0> Load balancer is out of the picture
> .
> 1>When I query with *distrib=false* , I get consistent results as expected
> for those shards that dont have the key i.e I dont get the results back for
> those shards, however I just realized that while *distrib=false* is present
> in the query for the shard that is supposed to contain the key,only the
> replica of the shard that has this key returns the result , and the leader
> does not , looks like replica and the leader do not have the same data and
> replica seems to contain the key in the query for that shard.
>
> 2> By indexing I mean this collection is being populated by a web crawler.
>
> So looks like 1> above  is pointing to leader and replica being out of
> synch for atleast one shard.
>
>
>
> On Thu, Oct 2, 2014 at 11:57 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> bq: Also ,the collection is being actively indexed as I query this, could
>> that
>> be an issue too ?
>>
>> Not if the documents you're searching aren't being added as you search
>> (and all your autocommit intervals have expired).
>>
>> I would turn off indexing for testing, it's just one more variable
>> that can get in the way of understanding this.
>>
>> Do note that if the problem were endemic to Solr, there would probably
>> be a _lot_ more noise out there.
>>
>> So to recap:
>> 0> we can take the load balancer out of the picture all together.
>>
>> 1> when you query each shard individually with &distrib=true, every
>> replica in a particular shard returns the same count.
>>
>> 2> when you query without &distrib=true you get varying counts.
>>
>> This is very strange and not at all expected. Let's try it again
>> without indexing going on....
>>
>> And what do you mean by "indexing" anyway? How are documents being fed
>> to your system?
>>
>> Best,
>> Erick@PuzzledAsWell
>>
>> On Thu, Oct 2, 2014 at 7:32 PM, S.L <si...@gmail.com> wrote:
>> > Erick,
>> >
>> > I would like to add that the interesting behavior i.e point #2 that I
>> > mentioned in my earlier reply  happens in all the shards , if this were
>> to
>> > be a distributed search issue this should have not manifested itself in
>> the
>> > shard that contains the key that I am searching for , looks like the
>> search
>> > is just failing as whole intermittently .
>> >
>> > Also ,the collection is being actively indexed as I query this, could
>> that
>> > be an issue too ?
>> >
>> > Thanks.
>> >
>> > On Thu, Oct 2, 2014 at 10:24 PM, S.L <si...@gmail.com> wrote:
>> >
>> >> Erick,
>> >>
>> >> Thanks for your reply, I tried your suggestions.
>> >>
>> >> 1 . When not using loadbalancer if  *I have distrib=false* I get
>> >> consistent results across the replicas.
>> >>
>> >> 2. However here's the insteresting part , while not using load balancer
>> if
>> >> I *dont have distrib=false* , then when I query a particular node ,I get
>> >> the same behaviour as if I were using a loadbalancer , meaning the
>> >> distributed search from a node works intermittently .Does this give any
>> >> clue ?
>> >>
>> >>
>> >>
>> >> On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson <erickerickson@gmail.com
>> >
>> >> wrote:
>> >>
>> >>> Hmmm, nothing quite makes sense here....
>> >>>
>> >>> Here are some experiments:
>> >>> 1> avoid the load balancer and issue queries like
>> >>> http://solr_server:8983/solr/collection/q=whatever&distrib=false
>> >>>
>> >>> the &distrib=false bit will cause keep SolrCloud from trying to send
>> >>> the queries anywhere, they'll be served only from the node you address
>> >>> them to.
>> >>> that'll help check whether the nodes are consistent. You should be
>> >>> getting back the same results from each replica in a shard (i.e. 2 of
>> >>> your 6 machines).
>> >>>
>> >>> Next, try your failing query the same way.
>> >>>
>> >>> Next, try your failing query from a browser, pointing it at successive
>> >>> nodes.
>> >>>
>> >>> Where is the first place problems show up?
>> >>>
>> >>> My _guess_ is that your load balancer isn't quite doing what you
>> think, or
>> >>> your cluster isn't set up the way you think it is, but those are
>> guesses.
>> >>>
>> >>> Best,
>> >>> Erick
>> >>>
>> >>> On Thu, Oct 2, 2014 at 2:51 PM, S.L <si...@gmail.com> wrote:
>> >>> > Hi All,
>> >>> >
>> >>> > I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
>> >>> > replication factor of 2 .
>> >>> >
>> >>> > I have fronted these 6 Solr nodes using a load balancer , what I
>> notice
>> >>> is
>> >>> > that every time I do a search of the form
>> >>> > q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a
>> result
>> >>> > only once in every 3 tries , telling me that the load balancer is
>> >>> > distributing the requests between the 3 shards and SolrCloud only
>> >>> returns a
>> >>> > result if the request goes to the core that as that id .
>> >>> >
>> >>> > However if I do a simple search like q=*:* , I consistently get the
>> >>> right
>> >>> > aggregated results back of all the documents across all the shards
>> for
>> >>> > every request from the load balancer. Can someone please let me know
>> >>> what
>> >>> > this is symptomatic of ?
>> >>> >
>> >>> > Somehow Solr Cloud seems to be doing search query distribution and
>> >>> > aggregation for queries of type *:* only.
>> >>> >
>> >>> > Thanks.
>> >>>
>> >>
>> >>
>>

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

Posted by "S.L" <si...@gmail.com>.
Eirck,

0> Load balancer is out of the picture
.
1>When I query with *distrib=false* , I get consistent results as expected
for those shards that dont have the key i.e I dont get the results back for
those shards, however I just realized that while *distrib=false* is present
in the query for the shard that is supposed to contain the key,only the
replica of the shard that has this key returns the result , and the leader
does not , looks like replica and the leader do not have the same data and
replica seems to contain the key in the query for that shard.

2> By indexing I mean this collection is being populated by a web crawler.

So looks like 1> above  is pointing to leader and replica being out of
synch for atleast one shard.



On Thu, Oct 2, 2014 at 11:57 PM, Erick Erickson <er...@gmail.com>
wrote:

> bq: Also ,the collection is being actively indexed as I query this, could
> that
> be an issue too ?
>
> Not if the documents you're searching aren't being added as you search
> (and all your autocommit intervals have expired).
>
> I would turn off indexing for testing, it's just one more variable
> that can get in the way of understanding this.
>
> Do note that if the problem were endemic to Solr, there would probably
> be a _lot_ more noise out there.
>
> So to recap:
> 0> we can take the load balancer out of the picture all together.
>
> 1> when you query each shard individually with &distrib=true, every
> replica in a particular shard returns the same count.
>
> 2> when you query without &distrib=true you get varying counts.
>
> This is very strange and not at all expected. Let's try it again
> without indexing going on....
>
> And what do you mean by "indexing" anyway? How are documents being fed
> to your system?
>
> Best,
> Erick@PuzzledAsWell
>
> On Thu, Oct 2, 2014 at 7:32 PM, S.L <si...@gmail.com> wrote:
> > Erick,
> >
> > I would like to add that the interesting behavior i.e point #2 that I
> > mentioned in my earlier reply  happens in all the shards , if this were
> to
> > be a distributed search issue this should have not manifested itself in
> the
> > shard that contains the key that I am searching for , looks like the
> search
> > is just failing as whole intermittently .
> >
> > Also ,the collection is being actively indexed as I query this, could
> that
> > be an issue too ?
> >
> > Thanks.
> >
> > On Thu, Oct 2, 2014 at 10:24 PM, S.L <si...@gmail.com> wrote:
> >
> >> Erick,
> >>
> >> Thanks for your reply, I tried your suggestions.
> >>
> >> 1 . When not using loadbalancer if  *I have distrib=false* I get
> >> consistent results across the replicas.
> >>
> >> 2. However here's the insteresting part , while not using load balancer
> if
> >> I *dont have distrib=false* , then when I query a particular node ,I get
> >> the same behaviour as if I were using a loadbalancer , meaning the
> >> distributed search from a node works intermittently .Does this give any
> >> clue ?
> >>
> >>
> >>
> >> On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson <erickerickson@gmail.com
> >
> >> wrote:
> >>
> >>> Hmmm, nothing quite makes sense here....
> >>>
> >>> Here are some experiments:
> >>> 1> avoid the load balancer and issue queries like
> >>> http://solr_server:8983/solr/collection/q=whatever&distrib=false
> >>>
> >>> the &distrib=false bit will cause keep SolrCloud from trying to send
> >>> the queries anywhere, they'll be served only from the node you address
> >>> them to.
> >>> that'll help check whether the nodes are consistent. You should be
> >>> getting back the same results from each replica in a shard (i.e. 2 of
> >>> your 6 machines).
> >>>
> >>> Next, try your failing query the same way.
> >>>
> >>> Next, try your failing query from a browser, pointing it at successive
> >>> nodes.
> >>>
> >>> Where is the first place problems show up?
> >>>
> >>> My _guess_ is that your load balancer isn't quite doing what you
> think, or
> >>> your cluster isn't set up the way you think it is, but those are
> guesses.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Thu, Oct 2, 2014 at 2:51 PM, S.L <si...@gmail.com> wrote:
> >>> > Hi All,
> >>> >
> >>> > I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
> >>> > replication factor of 2 .
> >>> >
> >>> > I have fronted these 6 Solr nodes using a load balancer , what I
> notice
> >>> is
> >>> > that every time I do a search of the form
> >>> > q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a
> result
> >>> > only once in every 3 tries , telling me that the load balancer is
> >>> > distributing the requests between the 3 shards and SolrCloud only
> >>> returns a
> >>> > result if the request goes to the core that as that id .
> >>> >
> >>> > However if I do a simple search like q=*:* , I consistently get the
> >>> right
> >>> > aggregated results back of all the documents across all the shards
> for
> >>> > every request from the load balancer. Can someone please let me know
> >>> what
> >>> > this is symptomatic of ?
> >>> >
> >>> > Somehow Solr Cloud seems to be doing search query distribution and
> >>> > aggregation for queries of type *:* only.
> >>> >
> >>> > Thanks.
> >>>
> >>
> >>
>

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

Posted by Erick Erickson <er...@gmail.com>.
bq: Also ,the collection is being actively indexed as I query this, could that
be an issue too ?

Not if the documents you're searching aren't being added as you search
(and all your autocommit intervals have expired).

I would turn off indexing for testing, it's just one more variable
that can get in the way of understanding this.

Do note that if the problem were endemic to Solr, there would probably
be a _lot_ more noise out there.

So to recap:
0> we can take the load balancer out of the picture all together.

1> when you query each shard individually with &distrib=true, every
replica in a particular shard returns the same count.

2> when you query without &distrib=true you get varying counts.

This is very strange and not at all expected. Let's try it again
without indexing going on....

And what do you mean by "indexing" anyway? How are documents being fed
to your system?

Best,
Erick@PuzzledAsWell

On Thu, Oct 2, 2014 at 7:32 PM, S.L <si...@gmail.com> wrote:
> Erick,
>
> I would like to add that the interesting behavior i.e point #2 that I
> mentioned in my earlier reply  happens in all the shards , if this were to
> be a distributed search issue this should have not manifested itself in the
> shard that contains the key that I am searching for , looks like the search
> is just failing as whole intermittently .
>
> Also ,the collection is being actively indexed as I query this, could that
> be an issue too ?
>
> Thanks.
>
> On Thu, Oct 2, 2014 at 10:24 PM, S.L <si...@gmail.com> wrote:
>
>> Erick,
>>
>> Thanks for your reply, I tried your suggestions.
>>
>> 1 . When not using loadbalancer if  *I have distrib=false* I get
>> consistent results across the replicas.
>>
>> 2. However here's the insteresting part , while not using load balancer if
>> I *dont have distrib=false* , then when I query a particular node ,I get
>> the same behaviour as if I were using a loadbalancer , meaning the
>> distributed search from a node works intermittently .Does this give any
>> clue ?
>>
>>
>>
>> On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson <er...@gmail.com>
>> wrote:
>>
>>> Hmmm, nothing quite makes sense here....
>>>
>>> Here are some experiments:
>>> 1> avoid the load balancer and issue queries like
>>> http://solr_server:8983/solr/collection/q=whatever&distrib=false
>>>
>>> the &distrib=false bit will cause keep SolrCloud from trying to send
>>> the queries anywhere, they'll be served only from the node you address
>>> them to.
>>> that'll help check whether the nodes are consistent. You should be
>>> getting back the same results from each replica in a shard (i.e. 2 of
>>> your 6 machines).
>>>
>>> Next, try your failing query the same way.
>>>
>>> Next, try your failing query from a browser, pointing it at successive
>>> nodes.
>>>
>>> Where is the first place problems show up?
>>>
>>> My _guess_ is that your load balancer isn't quite doing what you think, or
>>> your cluster isn't set up the way you think it is, but those are guesses.
>>>
>>> Best,
>>> Erick
>>>
>>> On Thu, Oct 2, 2014 at 2:51 PM, S.L <si...@gmail.com> wrote:
>>> > Hi All,
>>> >
>>> > I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
>>> > replication factor of 2 .
>>> >
>>> > I have fronted these 6 Solr nodes using a load balancer , what I notice
>>> is
>>> > that every time I do a search of the form
>>> > q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a result
>>> > only once in every 3 tries , telling me that the load balancer is
>>> > distributing the requests between the 3 shards and SolrCloud only
>>> returns a
>>> > result if the request goes to the core that as that id .
>>> >
>>> > However if I do a simple search like q=*:* , I consistently get the
>>> right
>>> > aggregated results back of all the documents across all the shards for
>>> > every request from the load balancer. Can someone please let me know
>>> what
>>> > this is symptomatic of ?
>>> >
>>> > Somehow Solr Cloud seems to be doing search query distribution and
>>> > aggregation for queries of type *:* only.
>>> >
>>> > Thanks.
>>>
>>
>>

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

Posted by "S.L" <si...@gmail.com>.
Erick,

I would like to add that the interesting behavior i.e point #2 that I
mentioned in my earlier reply  happens in all the shards , if this were to
be a distributed search issue this should have not manifested itself in the
shard that contains the key that I am searching for , looks like the search
is just failing as whole intermittently .

Also ,the collection is being actively indexed as I query this, could that
be an issue too ?

Thanks.

On Thu, Oct 2, 2014 at 10:24 PM, S.L <si...@gmail.com> wrote:

> Erick,
>
> Thanks for your reply, I tried your suggestions.
>
> 1 . When not using loadbalancer if  *I have distrib=false* I get
> consistent results across the replicas.
>
> 2. However here's the insteresting part , while not using load balancer if
> I *dont have distrib=false* , then when I query a particular node ,I get
> the same behaviour as if I were using a loadbalancer , meaning the
> distributed search from a node works intermittently .Does this give any
> clue ?
>
>
>
> On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Hmmm, nothing quite makes sense here....
>>
>> Here are some experiments:
>> 1> avoid the load balancer and issue queries like
>> http://solr_server:8983/solr/collection/q=whatever&distrib=false
>>
>> the &distrib=false bit will cause keep SolrCloud from trying to send
>> the queries anywhere, they'll be served only from the node you address
>> them to.
>> that'll help check whether the nodes are consistent. You should be
>> getting back the same results from each replica in a shard (i.e. 2 of
>> your 6 machines).
>>
>> Next, try your failing query the same way.
>>
>> Next, try your failing query from a browser, pointing it at successive
>> nodes.
>>
>> Where is the first place problems show up?
>>
>> My _guess_ is that your load balancer isn't quite doing what you think, or
>> your cluster isn't set up the way you think it is, but those are guesses.
>>
>> Best,
>> Erick
>>
>> On Thu, Oct 2, 2014 at 2:51 PM, S.L <si...@gmail.com> wrote:
>> > Hi All,
>> >
>> > I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
>> > replication factor of 2 .
>> >
>> > I have fronted these 6 Solr nodes using a load balancer , what I notice
>> is
>> > that every time I do a search of the form
>> > q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a result
>> > only once in every 3 tries , telling me that the load balancer is
>> > distributing the requests between the 3 shards and SolrCloud only
>> returns a
>> > result if the request goes to the core that as that id .
>> >
>> > However if I do a simple search like q=*:* , I consistently get the
>> right
>> > aggregated results back of all the documents across all the shards for
>> > every request from the load balancer. Can someone please let me know
>> what
>> > this is symptomatic of ?
>> >
>> > Somehow Solr Cloud seems to be doing search query distribution and
>> > aggregation for queries of type *:* only.
>> >
>> > Thanks.
>>
>
>

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

Posted by "S.L" <si...@gmail.com>.
Erick,

Thanks for your reply, I tried your suggestions.

1 . When not using loadbalancer if  *I have distrib=false* I get consistent
results across the replicas.

2. However here's the insteresting part , while not using load balancer if
I *dont have distrib=false* , then when I query a particular node ,I get
the same behaviour as if I were using a loadbalancer , meaning the
distributed search from a node works intermittently .Does this give any
clue ?



On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson <er...@gmail.com>
wrote:

> Hmmm, nothing quite makes sense here....
>
> Here are some experiments:
> 1> avoid the load balancer and issue queries like
> http://solr_server:8983/solr/collection/q=whatever&distrib=false
>
> the &distrib=false bit will cause keep SolrCloud from trying to send
> the queries anywhere, they'll be served only from the node you address
> them to.
> that'll help check whether the nodes are consistent. You should be
> getting back the same results from each replica in a shard (i.e. 2 of
> your 6 machines).
>
> Next, try your failing query the same way.
>
> Next, try your failing query from a browser, pointing it at successive
> nodes.
>
> Where is the first place problems show up?
>
> My _guess_ is that your load balancer isn't quite doing what you think, or
> your cluster isn't set up the way you think it is, but those are guesses.
>
> Best,
> Erick
>
> On Thu, Oct 2, 2014 at 2:51 PM, S.L <si...@gmail.com> wrote:
> > Hi All,
> >
> > I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
> > replication factor of 2 .
> >
> > I have fronted these 6 Solr nodes using a load balancer , what I notice
> is
> > that every time I do a search of the form
> > q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a result
> > only once in every 3 tries , telling me that the load balancer is
> > distributing the requests between the 3 shards and SolrCloud only
> returns a
> > result if the request goes to the core that as that id .
> >
> > However if I do a simple search like q=*:* , I consistently get the right
> > aggregated results back of all the documents across all the shards for
> > every request from the load balancer. Can someone please let me know what
> > this is symptomatic of ?
> >
> > Somehow Solr Cloud seems to be doing search query distribution and
> > aggregation for queries of type *:* only.
> >
> > Thanks.
>

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, nothing quite makes sense here....

Here are some experiments:
1> avoid the load balancer and issue queries like
http://solr_server:8983/solr/collection/q=whatever&distrib=false

the &distrib=false bit will cause keep SolrCloud from trying to send
the queries anywhere, they'll be served only from the node you address them to.
that'll help check whether the nodes are consistent. You should be
getting back the same results from each replica in a shard (i.e. 2 of
your 6 machines).

Next, try your failing query the same way.

Next, try your failing query from a browser, pointing it at successive
nodes.

Where is the first place problems show up?

My _guess_ is that your load balancer isn't quite doing what you think, or
your cluster isn't set up the way you think it is, but those are guesses.

Best,
Erick

On Thu, Oct 2, 2014 at 2:51 PM, S.L <si...@gmail.com> wrote:
> Hi All,
>
> I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
> replication factor of 2 .
>
> I have fronted these 6 Solr nodes using a load balancer , what I notice is
> that every time I do a search of the form
> q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a result
> only once in every 3 tries , telling me that the load balancer is
> distributing the requests between the 3 shards and SolrCloud only returns a
> result if the request goes to the core that as that id .
>
> However if I do a simple search like q=*:* , I consistently get the right
> aggregated results back of all the documents across all the shards for
> every request from the load balancer. Can someone please let me know what
> this is symptomatic of ?
>
> Somehow Solr Cloud seems to be doing search query distribution and
> aggregation for queries of type *:* only.
>
> Thanks.