You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sling <sl...@gmail.com> on 2013/12/02 09:33:04 UTC

SolrCloud FunctionQuery inconsistency

Hi,
I have a solrcloud with 4 shards. They are running normally.
How is possible that the same function query returns different results? 
And it happens even in the same shard?

However, when sort by "ptime desc", the result is consistent.
The dateDeboost generate the time-weight from ptime, which is multiplied by
the score.

The result is as follows:
{
  "responseHeader":{
    "status":0,
    "QTime":7,
    "params":{
      "fl":"id",
      "shards":"shard3",
      "cache":"false",
      "indent":"true",
      "start":"0",
      "q":"{!boost b=dateDeboost(ptime)}channelid:0082 && (title:\"abc\" ||
dkeys:\"abc\")",
      "wt":"json",
      "rows":"5"}},
  "response":{"numFound":121,"start":0,"maxScore":0.5319116,"docs":[
      {
        "id":"9EORHN5I00824IHR"},
      {
        "id":"9EOPQGOI00824IMP"},
      {
        "id":"9EMATM6900824IHR"},
      {
        "id":"9EJLBOEN00824IHR"},
      {
        "id":"9E6V45IM00824IHR"}]
  }}



{
  "responseHeader":{
    "status":0,
    "QTime":6,
    "params":{
      "fl":"id",
      "shards":"shard3",
      "cache":"false",
      "indent":"true",
      "start":"0",
      "q":"{!boost b=dateDeboost(ptime)}channelid:0082 && (title:\"abc\" ||
dkeys:\"abc\")",
      "wt":"json",
      "rows":"5"}},
  "response":{"numFound":121,"start":0,"maxScore":0.5319117,"docs":[
      {
        "id":"9EOPQGOI00824IMP"},
      {
        "id":"9EORHN5I00824IHR"},
      {
        "id":"9EMATM6900824IHR"},
      {
        "id":"9EJLBOEN00824IHR"},
      {
        "id":"9E1LP3S300824IHR"}]
  }}





--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: SolrCloud FunctionQuery inconsistency

Posted by sling <sl...@gmail.com>.
Hi Raju,
Collection is a concept in solrcloud, and core is in standalone mode.
So you can create multiple cores in solr standalone mode, not collections.



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4104888.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: SolrCloud FunctionQuery inconsistency

Posted by Raju Shikha <ra...@cigniti.com>.
Hi All,

Sorry to ask, is it possible to create multiple collections in solr standalone mode.I mean only one solr instance.I am able to create multiple collections in solr cloud environment. But when creating in solr standalone, it is saying, solr is not in cloud mode.Any suggestions great help..


Regards,
Raju Shikha

-----Original Message-----
From: sling [mailto:sling358@gmail.com] 
Sent: 04 December 2013 08:33
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud FunctionQuery inconsistency

Thanks, Chirs:
The schema is:
<field name="title" type="textComplex" indexed="true" stored="false"
multiValued="false" omitNorms="true"  />
<field name="dkeys" type="textComplex" indexed="true" stored="false"
multiValued="false" omitNorms="true" />
<field name="ptime" type="date" indexed="true" stored="false"
multiValued="false" omitNorms="true" />

There is no default value for ptime. It is generated by users.

There are 4 shards in this solrcloud, and 2 nodes in each shard.

I was trying query with a function query({!boost b=dateDeboost(ptime)}
channelid:0082 && title:abc), which leads differents results from the same
shard(using the param: shards=shard3).

The diffenence is maxScore, which is not consistent. And the maxScore is
either score A or score B.
And at the same time, new docs are indexed.
In my opinion, the maxScore should be the same between querys in a very
short time. or at least, it shoud not always change between score A and
score B.

And quite by accident, the sort result is even inconsistent(say there is a
doc in this query, and not in another query, over and over ). It does appear
once, but not reappear again.


Does this mean , when query happens, the index in replica has not synced
from its leader? so if query from different nodes from the shard at the same
time, it shows different results.





--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4104851.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud FunctionQuery inconsistency

Posted by Shawn Heisey <el...@elyograg.org>.
On 12/5/2013 2:27 AM, sling wrote:
> By the way, the "shards" param is running ok with the value
> "localhost:7574/solr,localhost:8983/solr" or "shard2",
> but it get an exception with only one replica "localhost:7574/solr";
> 
> right:  
> shards=204.lead.index.com:9090/solr/doc/,66.index.com:8080/solr/doc/      
> wrong: shards=204.lead.index.com:9090/solr/doc/
> why can't this param run with only one replica?

I wasn't even aware that you could use the traditional shards parameter
in conjunction with SolrCloud, but it's documented on the wiki, so I
guess you can.

I don't know if the behavior you are seeing should be considered a bug
or if it is expected, but if you want to only query one specific core,
you can do so by sending your request directly to that core with a
"distrib=false" parameter.

I'll leave the question of whether this is a bug or not to someone who
knows how it should operate.

Thanks,
Shawn


Re: SolrCloud FunctionQuery inconsistency

Posted by sling <sl...@gmail.com>.
By the way, the "shards" param is running ok with the value
"localhost:7574/solr,localhost:8983/solr" or "shard2",
but it get an exception with only one replica "localhost:7574/solr";

right:  
shards=204.lead.index.com:9090/solr/doc/,66.index.com:8080/solr/doc/      
wrong: shards=204.lead.index.com:9090/solr/doc/
why can't this param run with only one replica?



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4105078.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud FunctionQuery inconsistency

Posted by sling <sl...@gmail.com>.
Thank you, Chris.

I notice crontabs are performed at different time in replicas(delayed for 10
minutes against its leader), and these crontabs is to reload dic files.
Therefore, the terms are slightly different between replicas.
So the maxScore shows difference.


Best,
Sling



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4105293.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud FunctionQuery inconsistency

Posted by Chris Hostetter <ho...@fucit.org>.
: There is no default value for ptime. It is generated by users.

thank you, that rules out my previous wild guess.

: I was trying query with a function query({!boost b=dateDeboost(ptime)}
: channelid:0082 && title:abc), which leads differents results from the same
: shard(using the param: shards=shard3).
: 
: The diffenence is maxScore, which is not consistent. And the maxScore is

Ok ... but you still haven't provided enough information for us to make a 
guess as to why you are seeing inconsistent scores coming back form your 
queries -- at a minimum we need to see the debugQuery=true output for each 
of the different replicas that are generating differnet scores.

It's possible that the descrepency you are seeing is a minor one resulting 
from slightly different term stats (ie: segments being merged slightly 
differnetly on differnet replicas), or it could be a symptom of a larger 
problem.



-Hoss
http://www.lucidworks.com/

Re: SolrCloud FunctionQuery inconsistency

Posted by sling <sl...@gmail.com>.
Thanks, Chirs:
The schema is:
<field name="title" type="textComplex" indexed="true" stored="false"
multiValued="false" omitNorms="true"  />
<field name="dkeys" type="textComplex" indexed="true" stored="false"
multiValued="false" omitNorms="true" />
<field name="ptime" type="date" indexed="true" stored="false"
multiValued="false" omitNorms="true" />

There is no default value for ptime. It is generated by users.

There are 4 shards in this solrcloud, and 2 nodes in each shard.

I was trying query with a function query({!boost b=dateDeboost(ptime)}
channelid:0082 && title:abc), which leads differents results from the same
shard(using the param: shards=shard3).

The diffenence is maxScore, which is not consistent. And the maxScore is
either score A or score B.
And at the same time, new docs are indexed.
In my opinion, the maxScore should be the same between querys in a very
short time. or at least, it shoud not always change between score A and
score B.

And quite by accident, the sort result is even inconsistent(say there is a
doc in this query, and not in another query, over and over ). It does appear
once, but not reappear again.


Does this mean , when query happens, the index in replica has not synced
from its leader? so if query from different nodes from the shard at the same
time, it shows different results.





--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4104851.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud FunctionQuery inconsistency

Posted by Chris Hostetter <ho...@fucit.org>.
: Yes, I am populating "ptime" using a default of "NOW".
: 
: I only store the id, so I can't get ptime values. But from the perspective
: of business logic, ptime should not change.

if you are populating it using a *schema* default then the warning text i 
pasted into my last message would definitely apply to your situation and 
eeasily explain the behavior your are seeing -- because the schema 
defaults are applied on a per node basis, so the values wouldn't be 
garunteed to be consistent for hte entire shard.

If you are populating it using and update processor that fills in a 
default (such as the TimestampUpdateProcessorFactory i linked to in my 
last message) prior to the distribute update logic, then everything should 
be working fine and if you are seeing the order change then the problem is 
likeley unrelated to my wild guess.

As erick said: you have to give us a *lot* more details (exactly what 
your data looks like, what queries you are doing, what results you see, 
how those results differ from what you expect, etc...) in order to provide 
more useful/meaningful advice.

https://wiki.apache.org/solr/UsingMailingLists


-Hoss
http://www.lucidworks.com/

Re: SolrCloud FunctionQuery inconsistency

Posted by sling <sl...@gmail.com>.
Thank for your reply, Chris.

Yes, I am populating "ptime" using a default of "NOW".

I only store the id, so I can't get ptime values. But from the perspective
of business logic, ptime should not change.

Strangely, the sort result is consistent now... :(
I should do more test case...



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4104558.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud FunctionQuery inconsistency

Posted by Chris Hostetter <ho...@fucit.org>.
: However, when sort by "ptime desc", the result is consistent.
: The dateDeboost generate the time-weight from ptime, which is multiplied by
: the score.

As Erick mentioned, you haven't given us enough details to make any 
educated guesses as to what problem you are seeing.

My wild, uneducated, shot in the dark guess: are you populating "ptime" 
using a default of "NOW"?  

If so, can you rule out the function as an issue by asking for fl=id,ptime 
and confirming that the ptime for these documents sometimes varies slightly?

"NOTE: Allthough it is possible to configure a TrieDateField instance with 
a default value of "NOW" to compute a timestamp of when the document was 
indexed, this is not advisable when using SolrCloud since each replica of 
the document may compute a slightly different value. 
TimestampUpdateProcessorFactory is recomended instead."

https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/schema/TrieDateField.html
https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/update/processor/TimestampUpdateProcessorFactory.html


-Hoss

Re: SolrCloud FunctionQuery inconsistency

Posted by sling <sl...@gmail.com>.
Thanks, Erick

I mean the first id of the results is not consistent, and the maxScore is
not too.

When query, I do index docs at the same time, but they are not revelent to
this query. 

The updated docs can not affect tf cals, and for idf, they should affect for
all docs, so the results should consistent.

But for the same query, it shows diffenents sort(either sort A or sort B)
over and over.

Thanks,
sling



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4104549.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud FunctionQuery inconsistency

Posted by Erick Erickson <er...@gmail.com>.
I'm not quite sure what you're seeing as
inconsistent, you didn't say. Is it the
maxScore? Did you index any docs
in the mean time? Even though both
show 121 docs, if you updated some
docs it might affect the score because
the terms from the old docs still affect
tf/idf calcs and thus the boosted score.

Or if an optimize or merge happened,
that might also affect things.

Best,
Erick


On Mon, Dec 2, 2013 at 3:33 AM, sling <sl...@gmail.com> wrote:

> Hi,
> I have a solrcloud with 4 shards. They are running normally.
> How is possible that the same function query returns different results?
> And it happens even in the same shard?
>
> However, when sort by "ptime desc", the result is consistent.
> The dateDeboost generate the time-weight from ptime, which is multiplied by
> the score.
>
> The result is as follows:
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":7,
>     "params":{
>       "fl":"id",
>       "shards":"shard3",
>       "cache":"false",
>       "indent":"true",
>       "start":"0",
>       "q":"{!boost b=dateDeboost(ptime)}channelid:0082 && (title:\"abc\" ||
> dkeys:\"abc\")",
>       "wt":"json",
>       "rows":"5"}},
>   "response":{"numFound":121,"start":0,"maxScore":0.5319116,"docs":[
>       {
>         "id":"9EORHN5I00824IHR"},
>       {
>         "id":"9EOPQGOI00824IMP"},
>       {
>         "id":"9EMATM6900824IHR"},
>       {
>         "id":"9EJLBOEN00824IHR"},
>       {
>         "id":"9E6V45IM00824IHR"}]
>   }}
>
>
>
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":6,
>     "params":{
>       "fl":"id",
>       "shards":"shard3",
>       "cache":"false",
>       "indent":"true",
>       "start":"0",
>       "q":"{!boost b=dateDeboost(ptime)}channelid:0082 && (title:\"abc\" ||
> dkeys:\"abc\")",
>       "wt":"json",
>       "rows":"5"}},
>   "response":{"numFound":121,"start":0,"maxScore":0.5319117,"docs":[
>       {
>         "id":"9EOPQGOI00824IMP"},
>       {
>         "id":"9EORHN5I00824IHR"},
>       {
>         "id":"9EMATM6900824IHR"},
>       {
>         "id":"9EJLBOEN00824IHR"},
>       {
>         "id":"9E1LP3S300824IHR"}]
>   }}
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>