You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David Smith <ds...@yahoo.com.INVALID> on 2014/12/16 18:17:05 UTC

Identical query returning different aggregate results

I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1 replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6.  
The very first app test case I wrote is failing intermittently in this environment, when I only have 4 documents ingested into the cloud.
I dug in and found when I query against multiple collections, using the "collection=" parameter, the aggregates I request are correct about 50% of the time.  The other 50% of the time, the aggregate returned by Solr is not correct. Note this is for the identical query.  In other words, I can run the same query multiple times in a row, and get different answers.

The simplest version of the query that still exhibits the odd behavior is as follows:
http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true

When it SUCCEEDS, the aggregate correctly appears like this:

  "facet_counts":{    "facet_queries":{},    "facet_fields":{},    "facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[          "2014-04-01T00:00:00Z",3],        "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}

When it FAILS, note that the counts[] array is empty:
  "facet_counts":{    "facet_queries":{},    "facet_fields":{},    "facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[],        "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}

If I further simplify the query, by removing range options or reducing to one (1) collection name, then the problem goes away.

The solr logs are clean at INFO level, and there is no substantive difference in log output when the query succeeds vs fails, leaving me stumped where to look next.  Suggestions welcome.
Regards,
David





Re: Identical query returning different aggregate results

Posted by Erick Erickson <er...@gmail.com>.
Wow, advancing senility... _I'm_ actually the person that committed that fix...

Siiigggghhh.

On Tue, Dec 16, 2014 at 5:38 PM, David Smith
<ds...@yahoo.com.invalid> wrote:
> Chris,
>
> Yes, your suggestion worked.  Changing the parameter in my query from
>
> ...f.eventDate.facet.mincount=1...
>
>
> to
>
> ...f.eventDate.facet.mincount=0...
>
>
> worked around the problem. And I agree that SOLR-6154 describes what I observed almost exactly.  Once 5.0 is available, I'll test this again with "mincount=1".
>
> Thanks everyone for your help! It is very much appreciated.
>
> Regards,
> David
>
>      On Tuesday, December 16, 2014 4:38 PM, Chris Hostetter <ho...@fucit.org> wrote:
>
>
>
> sounds like this bug...
>
> https://issues.apache.org/jira/browse/SOLR-6154
>
> ...in which case it has nothing to do with your use of multiple
> collections, it's just dependent on wether or not the first node to
> respond happens to have a doc in every "range bucket" .. any bucket
> missing (because of your mincount=1) from the first core to
> respond is then ignored in the response fro mthe subsequent cores.
>
> workarround is to set mincount=0 for your facet ranges.
>
>
>
> : Date: Tue, 16 Dec 2014 17:17:05 +0000 (UTC)
> : From: David Smith <ds...@yahoo.com.invalid>
> : Reply-To: solr-user@lucene.apache.org, David Smith <ds...@yahoo.com>
> : To: Solr-user <so...@lucene.apache.org>
> : Subject: Identical query returning different aggregate results
> :
> : I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1 replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6.
> : The very first app test case I wrote is failing intermittently in this environment, when I only have 4 documents ingested into the cloud.
> : I dug in and found when I query against multiple collections, using the "collection=" parameter, the aggregates I request are correct about 50% of the time.  The other 50% of the time, the aggregate returned by Solr is not correct. Note this is for the identical query.  In other words, I can run the same query multiple times in a row, and get different answers.
> :
> : The simplest version of the query that still exhibits the odd behavior is as follows:
> : http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
> :
> : When it SUCCEEDS, the aggregate correctly appears like this:
> :
> :   "facet_counts":{    "facet_queries":{},    "facet_fields":{},    "facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[          "2014-04-01T00:00:00Z",3],        "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}
> :
> : When it FAILS, note that the counts[] array is empty:
> :   "facet_counts":{    "facet_queries":{},    "facet_fields":{},    "facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[],        "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}
> :
> : If I further simplify the query, by removing range options or reducing to one (1) collection name, then the problem goes away.
> :
> : The solr logs are clean at INFO level, and there is no substantive difference in log output when the query succeeds vs fails, leaving me stumped where to look next.  Suggestions welcome.
> : Regards,
> : David
> :
> :
> :
> :
> :
>
> -Hoss
> http://www.lucidworks.com/
>
>

Re: Identical query returning different aggregate results

Posted by David Smith <ds...@yahoo.com.INVALID>.
Chris,

Yes, your suggestion worked.  Changing the parameter in my query from 

...f.eventDate.facet.mincount=1...


to

...f.eventDate.facet.mincount=0...


worked around the problem. And I agree that SOLR-6154 describes what I observed almost exactly.  Once 5.0 is available, I'll test this again with "mincount=1".

Thanks everyone for your help! It is very much appreciated.

Regards,
David 

     On Tuesday, December 16, 2014 4:38 PM, Chris Hostetter <ho...@fucit.org> wrote:
   

 
sounds like this bug...

https://issues.apache.org/jira/browse/SOLR-6154

...in which case it has nothing to do with your use of multiple 
collections, it's just dependent on wether or not the first node to 
respond happens to have a doc in every "range bucket" .. any bucket 
missing (because of your mincount=1) from the first core to 
respond is then ignored in the response fro mthe subsequent cores.

workarround is to set mincount=0 for your facet ranges.



: Date: Tue, 16 Dec 2014 17:17:05 +0000 (UTC)
: From: David Smith <ds...@yahoo.com.invalid>
: Reply-To: solr-user@lucene.apache.org, David Smith <ds...@yahoo.com>
: To: Solr-user <so...@lucene.apache.org>
: Subject: Identical query returning different aggregate results
: 
: I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1 replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6.  
: The very first app test case I wrote is failing intermittently in this environment, when I only have 4 documents ingested into the cloud.
: I dug in and found when I query against multiple collections, using the "collection=" parameter, the aggregates I request are correct about 50% of the time.  The other 50% of the time, the aggregate returned by Solr is not correct. Note this is for the identical query.  In other words, I can run the same query multiple times in a row, and get different answers.
: 
: The simplest version of the query that still exhibits the odd behavior is as follows:
: http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
: 
: When it SUCCEEDS, the aggregate correctly appears like this:
: 
:   "facet_counts":{    "facet_queries":{},    "facet_fields":{},    "facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[          "2014-04-01T00:00:00Z",3],        "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}
: 
: When it FAILS, note that the counts[] array is empty:
:   "facet_counts":{    "facet_queries":{},    "facet_fields":{},    "facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[],        "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}
: 
: If I further simplify the query, by removing range options or reducing to one (1) collection name, then the problem goes away.
: 
: The solr logs are clean at INFO level, and there is no substantive difference in log output when the query succeeds vs fails, leaving me stumped where to look next.  Suggestions welcome.
: Regards,
: David
: 
: 
: 
: 
: 

-Hoss
http://www.lucidworks.com/

   

Re: Identical query returning different aggregate results

Posted by Chris Hostetter <ho...@fucit.org>.
sounds like this bug...

https://issues.apache.org/jira/browse/SOLR-6154

...in which case it has nothing to do with your use of multiple 
collections, it's just dependent on wether or not the first node to 
respond happens to have a doc in every "range bucket" .. any bucket 
missing (because of your mincount=1) from the first core to 
respond is then ignored in the response fro mthe subsequent cores.

workarround is to set mincount=0 for your facet ranges.



: Date: Tue, 16 Dec 2014 17:17:05 +0000 (UTC)
: From: David Smith <ds...@yahoo.com.invalid>
: Reply-To: solr-user@lucene.apache.org, David Smith <ds...@yahoo.com>
: To: Solr-user <so...@lucene.apache.org>
: Subject: Identical query returning different aggregate results
: 
: I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1 replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6.  
: The very first app test case I wrote is failing intermittently in this environment, when I only have 4 documents ingested into the cloud.
: I dug in and found when I query against multiple collections, using the "collection=" parameter, the aggregates I request are correct about 50% of the time.  The other 50% of the time, the aggregate returned by Solr is not correct. Note this is for the identical query.  In other words, I can run the same query multiple times in a row, and get different answers.
: 
: The simplest version of the query that still exhibits the odd behavior is as follows:
: http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
: 
: When it SUCCEEDS, the aggregate correctly appears like this:
: 
:   "facet_counts":{    "facet_queries":{},    "facet_fields":{},    "facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[          "2014-04-01T00:00:00Z",3],        "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}
: 
: When it FAILS, note that the counts[] array is empty:
:   "facet_counts":{    "facet_queries":{},    "facet_fields":{},    "facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[],        "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}
: 
: If I further simplify the query, by removing range options or reducing to one (1) collection name, then the problem goes away.
: 
: The solr logs are clean at INFO level, and there is no substantive difference in log output when the query succeeds vs fails, leaving me stumped where to look next.  Suggestions welcome.
: Regards,
: David
: 
: 
: 
: 
: 

-Hoss
http://www.lucidworks.com/

Re: Identical query returning different aggregate results

Posted by Erick Erickson <er...@gmail.com>.
Ah, OK. I didn't get that when I read your first e-mail...

Hmmm, this is still a puzzle then. Tail the respective Solr logs, you _should_
be seeing the sub-query go to each of them and the sub-query _should_
carry along all of the faceting information. Or this might just be a flat bug...

Best,
Erick

On Tue, Dec 16, 2014 at 2:46 PM, David Smith
<ds...@yahoo.com.invalid> wrote:
> Hi Erick,
> Thanks for your reply.
> My test environment only has one shard and one replica per collection.  So, I think there is no possibility of replicas getting out of sync.  Here is how I create each (month-based) collection:
> http://192.168.59.103:8983/solr/admin/collections?action=CREATE&name=2014_01&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=main_confhttp://192.168.59.103:8983/solr/admin/collections?action=CREATE&name=2014_02&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=main_confhttp://192.168.59.103:8983/solr/admin/collections?action=CREATE&name=2014_03&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=main_conf...etc, etc...
>
> Still, I think you are on to something.  I had already noticed that querying one collection at a time works.  For example, if I change my query oh-so-slightly from this:
>
> "....collection=2014_04,2014_03...."
>
> to this
>
> "...collection=2014_04...."
>
> Then, the results are correct 100% of the time. I think substantively this is the same as specifying the name of the shard since, again, in my test environment I only have one shard per collection anyway.
> I should mention that the "2014_03" collection is empty.  0 documents.  All 3 documents which satisfy the facet range are in the "2014_04" collection.  So, it's a real head-scratcher that introducing that collection name into the query makes the results misbehave.
> Kind regards,David
>      On Tuesday, December 16, 2014 2:25 PM, Erick Erickson <er...@gmail.com> wrote:
>
>
>  bq: Facet counts include deleted documents until the segments merge
>
> Whoa! Facet counts do _not_ require segment merging to be accurate.
> What merging does is remove the _term_ information associated with
> deleted documents, and removes their contribution to the TF/IDF
> scores.
>
> David:
> Hmmm, what happens if you direct the query not only to a single
> collection, but to a single shard? Add &distrib=false to the query and
> point it to each of your replicas. (one collection at a time). The
> expectation is that each replica for a slice within a collection has
> identical documents.
>
> One possibility is that somehow your shards are out of sync on a
> collection. So the internal load balancing that happens sometimes
> sends the query to one replica and sometime to another. 2 replicas
> (leader and follower) and 50% failure, coincidence?
>
> That just bumps the question up another level of course, the next
> question is _why_ is the shard out of sync. So in that case I'd issue
> a commit to all the collections on the off chance that somehow that
> didn't happen and try again (very low probability that this is the
> root cause, but you never know).
>
> but it sure sounds like one replica doesn't agree with another, so the
> above will give us place to look.
>
> Best,
> Erick
>
>
>
> On Tue, Dec 16, 2014 at 12:12 PM, David Smith
> <ds...@yahoo.com.invalid> wrote:
>> Alex,
>> Good suggestion, but in this case, no.  This example is from a cleanroom type test environment where the collections have very recently been created, there are only 4 documents total across all collections, and no delete's have been issued.
>> Kind regards,
>> David
>>
>>
>>      On Tuesday, December 16, 2014 12:01 PM, Alexandre Rafalovitch <ar...@gmail.com> wrote:
>>
>>
>>  Facet counts include deleted documents until the segments merge. Could that
>> be an issue?
>>
>> Regards,
>>    Alex
>> On 16/12/2014 12:18 pm, "David Smith" <ds...@yahoo.com.invalid> wrote:
>>
>>> I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1
>>> replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6.
>>> The very first app test case I wrote is failing intermittently in this
>>> environment, when I only have 4 documents ingested into the cloud.
>>> I dug in and found when I query against multiple collections, using the
>>> "collection=" parameter, the aggregates I request are correct about 50% of
>>> the time.  The other 50% of the time, the aggregate returned by Solr is not
>>> correct. Note this is for the identical query.  In other words, I can run
>>> the same query multiple times in a row, and get different answers.
>>>
>>> The simplest version of the query that still exhibits the odd behavior is
>>> as follows:
>>>
>>> http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
>>>
>>> When it SUCCEEDS, the aggregate correctly appears like this:
>>>
>>>  "facet_counts":{    "facet_queries":{},    "facet_fields":{},
>>> "facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[
>>>        "2014-04-01T00:00:00Z",3],        "gap":"+1DAY",
>>> "start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},
>>> "facet_intervals":{}}}
>>>
>>> When it FAILS, note that the counts[] array is empty:
>>>  "facet_counts":{    "facet_queries":{},    "facet_fields":{},
>>> "facet_dates":{},    "facet_ranges":{      "eventDate":{
>>> "counts":[],        "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",
>>>      "end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}
>>>
>>> If I further simplify the query, by removing range options or reducing to
>>> one (1) collection name, then the problem goes away.
>>>
>>> The solr logs are clean at INFO level, and there is no substantive
>>> difference in log output when the query succeeds vs fails, leaving me
>>> stumped where to look next.  Suggestions welcome.
>>> Regards,
>>> David
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>

Re: Identical query returning different aggregate results

Posted by David Smith <ds...@yahoo.com.INVALID>.
Hi Erick,
Thanks for your reply.
My test environment only has one shard and one replica per collection.  So, I think there is no possibility of replicas getting out of sync.  Here is how I create each (month-based) collection:
http://192.168.59.103:8983/solr/admin/collections?action=CREATE&name=2014_01&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=main_confhttp://192.168.59.103:8983/solr/admin/collections?action=CREATE&name=2014_02&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=main_confhttp://192.168.59.103:8983/solr/admin/collections?action=CREATE&name=2014_03&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=main_conf...etc, etc...

Still, I think you are on to something.  I had already noticed that querying one collection at a time works.  For example, if I change my query oh-so-slightly from this:

"....collection=2014_04,2014_03...."

to this

"...collection=2014_04...."

Then, the results are correct 100% of the time. I think substantively this is the same as specifying the name of the shard since, again, in my test environment I only have one shard per collection anyway.
I should mention that the "2014_03" collection is empty.  0 documents.  All 3 documents which satisfy the facet range are in the "2014_04" collection.  So, it's a real head-scratcher that introducing that collection name into the query makes the results misbehave.
Kind regards,David
     On Tuesday, December 16, 2014 2:25 PM, Erick Erickson <er...@gmail.com> wrote:
   

 bq: Facet counts include deleted documents until the segments merge

Whoa! Facet counts do _not_ require segment merging to be accurate.
What merging does is remove the _term_ information associated with
deleted documents, and removes their contribution to the TF/IDF
scores.

David:
Hmmm, what happens if you direct the query not only to a single
collection, but to a single shard? Add &distrib=false to the query and
point it to each of your replicas. (one collection at a time). The
expectation is that each replica for a slice within a collection has
identical documents.

One possibility is that somehow your shards are out of sync on a
collection. So the internal load balancing that happens sometimes
sends the query to one replica and sometime to another. 2 replicas
(leader and follower) and 50% failure, coincidence?

That just bumps the question up another level of course, the next
question is _why_ is the shard out of sync. So in that case I'd issue
a commit to all the collections on the off chance that somehow that
didn't happen and try again (very low probability that this is the
root cause, but you never know).

but it sure sounds like one replica doesn't agree with another, so the
above will give us place to look.

Best,
Erick



On Tue, Dec 16, 2014 at 12:12 PM, David Smith
<ds...@yahoo.com.invalid> wrote:
> Alex,
> Good suggestion, but in this case, no.  This example is from a cleanroom type test environment where the collections have very recently been created, there are only 4 documents total across all collections, and no delete's have been issued.
> Kind regards,
> David
>
>
>      On Tuesday, December 16, 2014 12:01 PM, Alexandre Rafalovitch <ar...@gmail.com> wrote:
>
>
>  Facet counts include deleted documents until the segments merge. Could that
> be an issue?
>
> Regards,
>    Alex
> On 16/12/2014 12:18 pm, "David Smith" <ds...@yahoo.com.invalid> wrote:
>
>> I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1
>> replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6.
>> The very first app test case I wrote is failing intermittently in this
>> environment, when I only have 4 documents ingested into the cloud.
>> I dug in and found when I query against multiple collections, using the
>> "collection=" parameter, the aggregates I request are correct about 50% of
>> the time.  The other 50% of the time, the aggregate returned by Solr is not
>> correct. Note this is for the identical query.  In other words, I can run
>> the same query multiple times in a row, and get different answers.
>>
>> The simplest version of the query that still exhibits the odd behavior is
>> as follows:
>>
>> http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
>>
>> When it SUCCEEDS, the aggregate correctly appears like this:
>>
>>  "facet_counts":{    "facet_queries":{},    "facet_fields":{},
>> "facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[
>>        "2014-04-01T00:00:00Z",3],        "gap":"+1DAY",
>> "start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},
>> "facet_intervals":{}}}
>>
>> When it FAILS, note that the counts[] array is empty:
>>  "facet_counts":{    "facet_queries":{},    "facet_fields":{},
>> "facet_dates":{},    "facet_ranges":{      "eventDate":{
>> "counts":[],        "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",
>>      "end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}
>>
>> If I further simplify the query, by removing range options or reducing to
>> one (1) collection name, then the problem goes away.
>>
>> The solr logs are clean at INFO level, and there is no substantive
>> difference in log output when the query succeeds vs fails, leaving me
>> stumped where to look next.  Suggestions welcome.
>> Regards,
>> David
>>
>>
>>
>>
>>
>
>

   

Re: Identical query returning different aggregate results

Posted by Erick Erickson <er...@gmail.com>.
bq: Facet counts include deleted documents until the segments merge

Whoa! Facet counts do _not_ require segment merging to be accurate.
What merging does is remove the _term_ information associated with
deleted documents, and removes their contribution to the TF/IDF
scores.

David:
Hmmm, what happens if you direct the query not only to a single
collection, but to a single shard? Add &distrib=false to the query and
point it to each of your replicas. (one collection at a time). The
expectation is that each replica for a slice within a collection has
identical documents.

One possibility is that somehow your shards are out of sync on a
collection. So the internal load balancing that happens sometimes
sends the query to one replica and sometime to another. 2 replicas
(leader and follower) and 50% failure, coincidence?

That just bumps the question up another level of course, the next
question is _why_ is the shard out of sync. So in that case I'd issue
a commit to all the collections on the off chance that somehow that
didn't happen and try again (very low probability that this is the
root cause, but you never know).

but it sure sounds like one replica doesn't agree with another, so the
above will give us place to look.

Best,
Erick



On Tue, Dec 16, 2014 at 12:12 PM, David Smith
<ds...@yahoo.com.invalid> wrote:
> Alex,
> Good suggestion, but in this case, no.  This example is from a cleanroom type test environment where the collections have very recently been created, there are only 4 documents total across all collections, and no delete's have been issued.
> Kind regards,
> David
>
>
>      On Tuesday, December 16, 2014 12:01 PM, Alexandre Rafalovitch <ar...@gmail.com> wrote:
>
>
>  Facet counts include deleted documents until the segments merge. Could that
> be an issue?
>
> Regards,
>     Alex
> On 16/12/2014 12:18 pm, "David Smith" <ds...@yahoo.com.invalid> wrote:
>
>> I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1
>> replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6.
>> The very first app test case I wrote is failing intermittently in this
>> environment, when I only have 4 documents ingested into the cloud.
>> I dug in and found when I query against multiple collections, using the
>> "collection=" parameter, the aggregates I request are correct about 50% of
>> the time.  The other 50% of the time, the aggregate returned by Solr is not
>> correct. Note this is for the identical query.  In other words, I can run
>> the same query multiple times in a row, and get different answers.
>>
>> The simplest version of the query that still exhibits the odd behavior is
>> as follows:
>>
>> http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
>>
>> When it SUCCEEDS, the aggregate correctly appears like this:
>>
>>  "facet_counts":{    "facet_queries":{},    "facet_fields":{},
>> "facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[
>>        "2014-04-01T00:00:00Z",3],        "gap":"+1DAY",
>> "start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},
>> "facet_intervals":{}}}
>>
>> When it FAILS, note that the counts[] array is empty:
>>  "facet_counts":{    "facet_queries":{},    "facet_fields":{},
>> "facet_dates":{},    "facet_ranges":{      "eventDate":{
>> "counts":[],        "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",
>>      "end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}
>>
>> If I further simplify the query, by removing range options or reducing to
>> one (1) collection name, then the problem goes away.
>>
>> The solr logs are clean at INFO level, and there is no substantive
>> difference in log output when the query succeeds vs fails, leaving me
>> stumped where to look next.  Suggestions welcome.
>> Regards,
>> David
>>
>>
>>
>>
>>
>
>

Re: Identical query returning different aggregate results

Posted by David Smith <ds...@yahoo.com.INVALID>.
Alex,
Good suggestion, but in this case, no.  This example is from a cleanroom type test environment where the collections have very recently been created, there are only 4 documents total across all collections, and no delete's have been issued.
Kind regards,
David
 

     On Tuesday, December 16, 2014 12:01 PM, Alexandre Rafalovitch <ar...@gmail.com> wrote:
   

 Facet counts include deleted documents until the segments merge. Could that
be an issue?

Regards,
    Alex
On 16/12/2014 12:18 pm, "David Smith" <ds...@yahoo.com.invalid> wrote:

> I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1
> replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6.
> The very first app test case I wrote is failing intermittently in this
> environment, when I only have 4 documents ingested into the cloud.
> I dug in and found when I query against multiple collections, using the
> "collection=" parameter, the aggregates I request are correct about 50% of
> the time.  The other 50% of the time, the aggregate returned by Solr is not
> correct. Note this is for the identical query.  In other words, I can run
> the same query multiple times in a row, and get different answers.
>
> The simplest version of the query that still exhibits the odd behavior is
> as follows:
>
> http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
>
> When it SUCCEEDS, the aggregate correctly appears like this:
>
>  "facet_counts":{    "facet_queries":{},    "facet_fields":{},
> "facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[
>        "2014-04-01T00:00:00Z",3],        "gap":"+1DAY",
> "start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},
> "facet_intervals":{}}}
>
> When it FAILS, note that the counts[] array is empty:
>  "facet_counts":{    "facet_queries":{},    "facet_fields":{},
> "facet_dates":{},    "facet_ranges":{      "eventDate":{
> "counts":[],        "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",
>      "end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}
>
> If I further simplify the query, by removing range options or reducing to
> one (1) collection name, then the problem goes away.
>
> The solr logs are clean at INFO level, and there is no substantive
> difference in log output when the query succeeds vs fails, leaving me
> stumped where to look next.  Suggestions welcome.
> Regards,
> David
>
>
>
>
>

   

Re: Identical query returning different aggregate results

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Facet counts include deleted documents until the segments merge. Could that
be an issue?

Regards,
     Alex
On 16/12/2014 12:18 pm, "David Smith" <ds...@yahoo.com.invalid> wrote:

> I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1
> replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6.
> The very first app test case I wrote is failing intermittently in this
> environment, when I only have 4 documents ingested into the cloud.
> I dug in and found when I query against multiple collections, using the
> "collection=" parameter, the aggregates I request are correct about 50% of
> the time.  The other 50% of the time, the aggregate returned by Solr is not
> correct. Note this is for the identical query.  In other words, I can run
> the same query multiple times in a row, and get different answers.
>
> The simplest version of the query that still exhibits the odd behavior is
> as follows:
>
> http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
>
> When it SUCCEEDS, the aggregate correctly appears like this:
>
>   "facet_counts":{    "facet_queries":{},    "facet_fields":{},
> "facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[
>         "2014-04-01T00:00:00Z",3],        "gap":"+1DAY",
> "start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},
> "facet_intervals":{}}}
>
> When it FAILS, note that the counts[] array is empty:
>   "facet_counts":{    "facet_queries":{},    "facet_fields":{},
> "facet_dates":{},    "facet_ranges":{      "eventDate":{
> "counts":[],        "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",
>       "end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}
>
> If I further simplify the query, by removing range options or reducing to
> one (1) collection name, then the problem goes away.
>
> The solr logs are clean at INFO level, and there is no substantive
> difference in log output when the query succeeds vs fails, leaving me
> stumped where to look next.  Suggestions welcome.
> Regards,
> David
>
>
>
>
>