You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Francois Perron <Fr...@Ticketmaster.com> on 2014/04/30 22:06:56 UTC

Shards don't return documents in same order

Hi guys,

  I have a small SolrCloud setup (3 servers, 1 collection with 1 shard and 3 replicat).  In my schema, I have a alphaOnlySort field with a copyfield.

This is a part of my managed-schema :

    <field name="_root_" type="string" indexed="true" stored="false"/>
    <field name="_uid" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
    <field name="_version_" type="long" indexed="true" stored="true"/>
    <field name="event_id" type="string" indexed="true" stored="true"/>
    <field name="event_name" type="text_general" indexed="true" stored="true"/>
    <field name="event_name_sort" type="alphaOnlySort"/>

with the copyfield

  <copyField source="event_name" dest="event_name_sort"/>


The problem is : I query my collection with a sort on my alphasort field but on one of my servers, the sort order is not the same.

On server 1 and 2, I have this result :

<doc>
<str name="event_name">MB20140410A</str>
</doc>
<doc>
<str name="event_name">MB20140410A-New</str>
</doc>
<doc>
<str name="event_name">MB20140411A</str>
</doc>



and on the third one, this :

<str name="event_name">MB20140410A</str>
</doc>
<doc>
<str name="event_name">MB20140411A</str>
</doc>
<doc>
<str name="event_name">MB20140410A-New</str>
</doc>


The doc named "MB20140411A" should be at the end ...

Any idea ?

Regards

RE : RE : RE : Shards don't return documents in same order

Posted by Francois Perron <Fr...@Ticketmaster.com>.
Thanks for your help.  I don't know why but it's now working.  It's probably related to a schema update without core reload like you said.  I will double check next time we change the schema.

Thank you

________________________________________
De : Erick Erickson [erickerickson@gmail.com]
Envoyé : 6 mai 2014 11:39
À : solr-user@lucene.apache.org
Objet : Re: RE : RE : Shards don't return documents in same order

copyField should be working fine on all servers. What it sounds like
to me is that somehow your schema.xml file was different on one
machine. Now, this shouldn't be happening if you follow the practice
of altering your schema, pushing to ZooKeeper, _and_ restarting or
reloading your Solr nodes.

So if you, say, changed your schema to add the copyField, pushed it to
ZooKeeper and then didn't restart one of your nodes, _then_ indexed
you'd see something like this.

Nobody has reported similar issues with copyField, and it's especially
easy to have something like this occur when you're experimenting, so
I'd guess pilot error.

Best way to propagate config changes is probably using the Collections
API and the RELOAD command after you upload the changes to
ZooKeeper....

Best,
Erick

On Tue, May 6, 2014 at 5:37 AM, Francois Perron
<Fr...@ticketmaster.com> wrote:
> Hi Erick,
>
>   thanks for your help.  After some checks. it appear that sort field (the alphaSortOnly field) aren't feed on 'some' servers.  On the leader, the sort order are good and terms sorted seems ok.  But, on the server with issue, /terms return empty nodes (No data stored I guess).  After a full indexation (with same way then first time), all servers works as expected now.  It's like if the copyfield don't work on every servers ... it's really worrying.
>
> Is there known issues with copyfield with SolrCloud/replicats?
>
> Regards.
> ________________________________________
> De : Erick Erickson [erickerickson@gmail.com]
> Envoyé : 2 mai 2014 12:07
> À : solr-user@lucene.apache.org
> Objet : Re: RE : Shards don't return documents in same order
>
> Francois:
>
> Yes, there are several means to examine the raw terms in the index.
>> The admin/schema-browser page
>> TermsComponent: https://cwiki.apache.org/confluence/display/solr/The+Terms+Component
>> Luke
>
> the  schema-browser is all set up for you, it's easiest. The
> TermsComponent should be directly usable too, I believe it's
> configured by default in solrconfig.xml Luke takes a bit of setup but
> is a great tool.
>
> Did you re-index from scratch on all shards? I presume your ordering
> is still not the same on all shards... the order I'd expect would be:
> mb20140410a
> mb20140410anew
> mb20140411a
>
> Best,
> Erick
>
>
> On Thu, May 1, 2014 at 8:27 AM, Francois Perron
> <Fr...@ticketmaster.com> wrote:
>> Hi Erick,
>>
>>   thank you for your response.  You are right, I changed alphaOnlySort to keep lettres and numbers and to remove some acticles (a, an, the).
>>
>> This is the filetype definition :
>>
>>     <fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
>>       <analyzer>
>>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.TrimFilterFactory"/>
>>         <filter class="solr.PatternReplaceFilterFactory" replace="all" replacement="" pattern="(\b(a|an|the)\b|[^a-z,0-9])"/>
>>       </analyzer>
>>     </fieldType>
>>
>>
>> Then, I tested each name with admin ui on each server and this is the results :
>>
>> server1
>>
>> MB20140410A = mb20140410a
>> MB20140411A = mb20140411a
>> MB20140410A-New = mb20140410anew
>>
>> server2
>>
>> MB20140410A = mb20140410a
>> MB20140411A = mb20140411a
>> MB20140410A-New = mb20140410anew
>>
>> server3
>>
>> MB20140410A = mb20140410a
>> MB20140411A = mb20140411a
>> MB20140410A-New = mb20140410anew
>>
>> "Unfortunately", all results are identical so is there a mean to view data real indexed in these documents ?  Can be a problem with a particular server ?  All configs are in zookeeper so all cores shouldhave the same config, right ?  Is there any way to force a replicat to resynchronize ?
>>
>> Regards,
>>
>> Francois.
>>
>> ________________________________________
>> De : Erick Erickson [erickerickson@gmail.com]
>> Envoyé : 30 avril 2014 16:36
>> À : solr-user@lucene.apache.org
>> Objet : Re: Shards don't return documents in same order
>>
>> Hmmm, take a look at the admin/analysis page for these inputs for
>> alphaOnlySort. If you're using the stock Solr distro, you're probably
>> not considering the effects patternReplaceFilterFactory which is
>> removing all non-letters. So these three terms reduce to
>>
>> mba
>> mba
>> mbanew
>>
>> You can look at the actual indexed terms by the admin/schema-browser as well.
>>
>> That said, unless you transposed the order because you were
>> concentrating on the numeric part, the doc with MB20140410A-New should
>> always be sorting last.
>>
>> All of which is irrelevant if you're doing something else with
>> "alphaOnlySort", so please paste in the fieldType definition if you've
>> changed it.
>>
>> What gets returned in the doc for _stored_ data is a verbatim copy,
>> NOT the output of the analysis chain, which can be confusing.
>>
>> Oh, and Solr uses the internal lucene doc ID to break ties, and docs
>> on different replicas can have different internal Lucene doc IDs
>> relative to each other as a result of merging so that's something else
>> to watch out for.
>>
>> Best,
>> Erick
>>
>> On Wed, Apr 30, 2014 at 1:06 PM, Francois Perron
>> <Fr...@ticketmaster.com> wrote:
>>> Hi guys,
>>>
>>>   I have a small SolrCloud setup (3 servers, 1 collection with 1 shard and 3 replicat).  In my schema, I have a alphaOnlySort field with a copyfield.
>>>
>>> This is a part of my managed-schema :
>>>
>>>     <field name="_root_" type="string" indexed="true" stored="false"/>
>>>     <field name="_uid" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
>>>     <field name="_version_" type="long" indexed="true" stored="true"/>
>>>     <field name="event_id" type="string" indexed="true" stored="true"/>
>>>     <field name="event_name" type="text_general" indexed="true" stored="true"/>
>>>     <field name="event_name_sort" type="alphaOnlySort"/>
>>>
>>> with the copyfield
>>>
>>>   <copyField source="event_name" dest="event_name_sort"/>
>>>
>>>
>>> The problem is : I query my collection with a sort on my alphasort field but on one of my servers, the sort order is not the same.
>>>
>>> On server 1 and 2, I have this result :
>>>
>>> <doc>
>>> <str name="event_name">MB20140410A</str>
>>> </doc>
>>> <doc>
>>> <str name="event_name">MB20140410A-New</str>
>>> </doc>
>>> <doc>
>>> <str name="event_name">MB20140411A</str>
>>> </doc>
>>>
>>>
>>>
>>> and on the third one, this :
>>>
>>> <str name="event_name">MB20140410A</str>
>>> </doc>
>>> <doc>
>>> <str name="event_name">MB20140411A</str>
>>> </doc>
>>> <doc>
>>> <str name="event_name">MB20140410A-New</str>
>>> </doc>
>>>
>>>
>>> The doc named "MB20140411A" should be at the end ...
>>>
>>> Any idea ?
>>>
>>> Regards

Re: RE : RE : Shards don't return documents in same order

Posted by Erick Erickson <er...@gmail.com>.
copyField should be working fine on all servers. What it sounds like
to me is that somehow your schema.xml file was different on one
machine. Now, this shouldn't be happening if you follow the practice
of altering your schema, pushing to ZooKeeper, _and_ restarting or
reloading your Solr nodes.

So if you, say, changed your schema to add the copyField, pushed it to
ZooKeeper and then didn't restart one of your nodes, _then_ indexed
you'd see something like this.

Nobody has reported similar issues with copyField, and it's especially
easy to have something like this occur when you're experimenting, so
I'd guess pilot error.

Best way to propagate config changes is probably using the Collections
API and the RELOAD command after you upload the changes to
ZooKeeper....

Best,
Erick

On Tue, May 6, 2014 at 5:37 AM, Francois Perron
<Fr...@ticketmaster.com> wrote:
> Hi Erick,
>
>   thanks for your help.  After some checks. it appear that sort field (the alphaSortOnly field) aren't feed on 'some' servers.  On the leader, the sort order are good and terms sorted seems ok.  But, on the server with issue, /terms return empty nodes (No data stored I guess).  After a full indexation (with same way then first time), all servers works as expected now.  It's like if the copyfield don't work on every servers ... it's really worrying.
>
> Is there known issues with copyfield with SolrCloud/replicats?
>
> Regards.
> ________________________________________
> De : Erick Erickson [erickerickson@gmail.com]
> Envoyé : 2 mai 2014 12:07
> À : solr-user@lucene.apache.org
> Objet : Re: RE : Shards don't return documents in same order
>
> Francois:
>
> Yes, there are several means to examine the raw terms in the index.
>> The admin/schema-browser page
>> TermsComponent: https://cwiki.apache.org/confluence/display/solr/The+Terms+Component
>> Luke
>
> the  schema-browser is all set up for you, it's easiest. The
> TermsComponent should be directly usable too, I believe it's
> configured by default in solrconfig.xml Luke takes a bit of setup but
> is a great tool.
>
> Did you re-index from scratch on all shards? I presume your ordering
> is still not the same on all shards... the order I'd expect would be:
> mb20140410a
> mb20140410anew
> mb20140411a
>
> Best,
> Erick
>
>
> On Thu, May 1, 2014 at 8:27 AM, Francois Perron
> <Fr...@ticketmaster.com> wrote:
>> Hi Erick,
>>
>>   thank you for your response.  You are right, I changed alphaOnlySort to keep lettres and numbers and to remove some acticles (a, an, the).
>>
>> This is the filetype definition :
>>
>>     <fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
>>       <analyzer>
>>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.TrimFilterFactory"/>
>>         <filter class="solr.PatternReplaceFilterFactory" replace="all" replacement="" pattern="(\b(a|an|the)\b|[^a-z,0-9])"/>
>>       </analyzer>
>>     </fieldType>
>>
>>
>> Then, I tested each name with admin ui on each server and this is the results :
>>
>> server1
>>
>> MB20140410A = mb20140410a
>> MB20140411A = mb20140411a
>> MB20140410A-New = mb20140410anew
>>
>> server2
>>
>> MB20140410A = mb20140410a
>> MB20140411A = mb20140411a
>> MB20140410A-New = mb20140410anew
>>
>> server3
>>
>> MB20140410A = mb20140410a
>> MB20140411A = mb20140411a
>> MB20140410A-New = mb20140410anew
>>
>> "Unfortunately", all results are identical so is there a mean to view data real indexed in these documents ?  Can be a problem with a particular server ?  All configs are in zookeeper so all cores shouldhave the same config, right ?  Is there any way to force a replicat to resynchronize ?
>>
>> Regards,
>>
>> Francois.
>>
>> ________________________________________
>> De : Erick Erickson [erickerickson@gmail.com]
>> Envoyé : 30 avril 2014 16:36
>> À : solr-user@lucene.apache.org
>> Objet : Re: Shards don't return documents in same order
>>
>> Hmmm, take a look at the admin/analysis page for these inputs for
>> alphaOnlySort. If you're using the stock Solr distro, you're probably
>> not considering the effects patternReplaceFilterFactory which is
>> removing all non-letters. So these three terms reduce to
>>
>> mba
>> mba
>> mbanew
>>
>> You can look at the actual indexed terms by the admin/schema-browser as well.
>>
>> That said, unless you transposed the order because you were
>> concentrating on the numeric part, the doc with MB20140410A-New should
>> always be sorting last.
>>
>> All of which is irrelevant if you're doing something else with
>> "alphaOnlySort", so please paste in the fieldType definition if you've
>> changed it.
>>
>> What gets returned in the doc for _stored_ data is a verbatim copy,
>> NOT the output of the analysis chain, which can be confusing.
>>
>> Oh, and Solr uses the internal lucene doc ID to break ties, and docs
>> on different replicas can have different internal Lucene doc IDs
>> relative to each other as a result of merging so that's something else
>> to watch out for.
>>
>> Best,
>> Erick
>>
>> On Wed, Apr 30, 2014 at 1:06 PM, Francois Perron
>> <Fr...@ticketmaster.com> wrote:
>>> Hi guys,
>>>
>>>   I have a small SolrCloud setup (3 servers, 1 collection with 1 shard and 3 replicat).  In my schema, I have a alphaOnlySort field with a copyfield.
>>>
>>> This is a part of my managed-schema :
>>>
>>>     <field name="_root_" type="string" indexed="true" stored="false"/>
>>>     <field name="_uid" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
>>>     <field name="_version_" type="long" indexed="true" stored="true"/>
>>>     <field name="event_id" type="string" indexed="true" stored="true"/>
>>>     <field name="event_name" type="text_general" indexed="true" stored="true"/>
>>>     <field name="event_name_sort" type="alphaOnlySort"/>
>>>
>>> with the copyfield
>>>
>>>   <copyField source="event_name" dest="event_name_sort"/>
>>>
>>>
>>> The problem is : I query my collection with a sort on my alphasort field but on one of my servers, the sort order is not the same.
>>>
>>> On server 1 and 2, I have this result :
>>>
>>> <doc>
>>> <str name="event_name">MB20140410A</str>
>>> </doc>
>>> <doc>
>>> <str name="event_name">MB20140410A-New</str>
>>> </doc>
>>> <doc>
>>> <str name="event_name">MB20140411A</str>
>>> </doc>
>>>
>>>
>>>
>>> and on the third one, this :
>>>
>>> <str name="event_name">MB20140410A</str>
>>> </doc>
>>> <doc>
>>> <str name="event_name">MB20140411A</str>
>>> </doc>
>>> <doc>
>>> <str name="event_name">MB20140410A-New</str>
>>> </doc>
>>>
>>>
>>> The doc named "MB20140411A" should be at the end ...
>>>
>>> Any idea ?
>>>
>>> Regards

RE : RE : Shards don't return documents in same order

Posted by Francois Perron <Fr...@Ticketmaster.com>.
Hi Erick,

  thanks for your help.  After some checks. it appear that sort field (the alphaSortOnly field) aren't feed on 'some' servers.  On the leader, the sort order are good and terms sorted seems ok.  But, on the server with issue, /terms return empty nodes (No data stored I guess).  After a full indexation (with same way then first time), all servers works as expected now.  It's like if the copyfield don't work on every servers ... it's really worrying.

Is there known issues with copyfield with SolrCloud/replicats?

Regards.
________________________________________
De : Erick Erickson [erickerickson@gmail.com]
Envoyé : 2 mai 2014 12:07
À : solr-user@lucene.apache.org
Objet : Re: RE : Shards don't return documents in same order

Francois:

Yes, there are several means to examine the raw terms in the index.
> The admin/schema-browser page
> TermsComponent: https://cwiki.apache.org/confluence/display/solr/The+Terms+Component
> Luke

the  schema-browser is all set up for you, it's easiest. The
TermsComponent should be directly usable too, I believe it's
configured by default in solrconfig.xml Luke takes a bit of setup but
is a great tool.

Did you re-index from scratch on all shards? I presume your ordering
is still not the same on all shards... the order I'd expect would be:
mb20140410a
mb20140410anew
mb20140411a

Best,
Erick


On Thu, May 1, 2014 at 8:27 AM, Francois Perron
<Fr...@ticketmaster.com> wrote:
> Hi Erick,
>
>   thank you for your response.  You are right, I changed alphaOnlySort to keep lettres and numbers and to remove some acticles (a, an, the).
>
> This is the filetype definition :
>
>     <fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
>       <analyzer>
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.TrimFilterFactory"/>
>         <filter class="solr.PatternReplaceFilterFactory" replace="all" replacement="" pattern="(\b(a|an|the)\b|[^a-z,0-9])"/>
>       </analyzer>
>     </fieldType>
>
>
> Then, I tested each name with admin ui on each server and this is the results :
>
> server1
>
> MB20140410A = mb20140410a
> MB20140411A = mb20140411a
> MB20140410A-New = mb20140410anew
>
> server2
>
> MB20140410A = mb20140410a
> MB20140411A = mb20140411a
> MB20140410A-New = mb20140410anew
>
> server3
>
> MB20140410A = mb20140410a
> MB20140411A = mb20140411a
> MB20140410A-New = mb20140410anew
>
> "Unfortunately", all results are identical so is there a mean to view data real indexed in these documents ?  Can be a problem with a particular server ?  All configs are in zookeeper so all cores shouldhave the same config, right ?  Is there any way to force a replicat to resynchronize ?
>
> Regards,
>
> Francois.
>
> ________________________________________
> De : Erick Erickson [erickerickson@gmail.com]
> Envoyé : 30 avril 2014 16:36
> À : solr-user@lucene.apache.org
> Objet : Re: Shards don't return documents in same order
>
> Hmmm, take a look at the admin/analysis page for these inputs for
> alphaOnlySort. If you're using the stock Solr distro, you're probably
> not considering the effects patternReplaceFilterFactory which is
> removing all non-letters. So these three terms reduce to
>
> mba
> mba
> mbanew
>
> You can look at the actual indexed terms by the admin/schema-browser as well.
>
> That said, unless you transposed the order because you were
> concentrating on the numeric part, the doc with MB20140410A-New should
> always be sorting last.
>
> All of which is irrelevant if you're doing something else with
> "alphaOnlySort", so please paste in the fieldType definition if you've
> changed it.
>
> What gets returned in the doc for _stored_ data is a verbatim copy,
> NOT the output of the analysis chain, which can be confusing.
>
> Oh, and Solr uses the internal lucene doc ID to break ties, and docs
> on different replicas can have different internal Lucene doc IDs
> relative to each other as a result of merging so that's something else
> to watch out for.
>
> Best,
> Erick
>
> On Wed, Apr 30, 2014 at 1:06 PM, Francois Perron
> <Fr...@ticketmaster.com> wrote:
>> Hi guys,
>>
>>   I have a small SolrCloud setup (3 servers, 1 collection with 1 shard and 3 replicat).  In my schema, I have a alphaOnlySort field with a copyfield.
>>
>> This is a part of my managed-schema :
>>
>>     <field name="_root_" type="string" indexed="true" stored="false"/>
>>     <field name="_uid" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
>>     <field name="_version_" type="long" indexed="true" stored="true"/>
>>     <field name="event_id" type="string" indexed="true" stored="true"/>
>>     <field name="event_name" type="text_general" indexed="true" stored="true"/>
>>     <field name="event_name_sort" type="alphaOnlySort"/>
>>
>> with the copyfield
>>
>>   <copyField source="event_name" dest="event_name_sort"/>
>>
>>
>> The problem is : I query my collection with a sort on my alphasort field but on one of my servers, the sort order is not the same.
>>
>> On server 1 and 2, I have this result :
>>
>> <doc>
>> <str name="event_name">MB20140410A</str>
>> </doc>
>> <doc>
>> <str name="event_name">MB20140410A-New</str>
>> </doc>
>> <doc>
>> <str name="event_name">MB20140411A</str>
>> </doc>
>>
>>
>>
>> and on the third one, this :
>>
>> <str name="event_name">MB20140410A</str>
>> </doc>
>> <doc>
>> <str name="event_name">MB20140411A</str>
>> </doc>
>> <doc>
>> <str name="event_name">MB20140410A-New</str>
>> </doc>
>>
>>
>> The doc named "MB20140411A" should be at the end ...
>>
>> Any idea ?
>>
>> Regards

Re: RE : Shards don't return documents in same order

Posted by Erick Erickson <er...@gmail.com>.
Francois:

Yes, there are several means to examine the raw terms in the index.
> The admin/schema-browser page
> TermsComponent: https://cwiki.apache.org/confluence/display/solr/The+Terms+Component
> Luke

the  schema-browser is all set up for you, it's easiest. The
TermsComponent should be directly usable too, I believe it's
configured by default in solrconfig.xml Luke takes a bit of setup but
is a great tool.

Did you re-index from scratch on all shards? I presume your ordering
is still not the same on all shards... the order I'd expect would be:
mb20140410a
mb20140410anew
mb20140411a

Best,
Erick


On Thu, May 1, 2014 at 8:27 AM, Francois Perron
<Fr...@ticketmaster.com> wrote:
> Hi Erick,
>
>   thank you for your response.  You are right, I changed alphaOnlySort to keep lettres and numbers and to remove some acticles (a, an, the).
>
> This is the filetype definition :
>
>     <fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
>       <analyzer>
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.TrimFilterFactory"/>
>         <filter class="solr.PatternReplaceFilterFactory" replace="all" replacement="" pattern="(\b(a|an|the)\b|[^a-z,0-9])"/>
>       </analyzer>
>     </fieldType>
>
>
> Then, I tested each name with admin ui on each server and this is the results :
>
> server1
>
> MB20140410A = mb20140410a
> MB20140411A = mb20140411a
> MB20140410A-New = mb20140410anew
>
> server2
>
> MB20140410A = mb20140410a
> MB20140411A = mb20140411a
> MB20140410A-New = mb20140410anew
>
> server3
>
> MB20140410A = mb20140410a
> MB20140411A = mb20140411a
> MB20140410A-New = mb20140410anew
>
> "Unfortunately", all results are identical so is there a mean to view data real indexed in these documents ?  Can be a problem with a particular server ?  All configs are in zookeeper so all cores shouldhave the same config, right ?  Is there any way to force a replicat to resynchronize ?
>
> Regards,
>
> Francois.
>
> ________________________________________
> De : Erick Erickson [erickerickson@gmail.com]
> Envoyé : 30 avril 2014 16:36
> À : solr-user@lucene.apache.org
> Objet : Re: Shards don't return documents in same order
>
> Hmmm, take a look at the admin/analysis page for these inputs for
> alphaOnlySort. If you're using the stock Solr distro, you're probably
> not considering the effects patternReplaceFilterFactory which is
> removing all non-letters. So these three terms reduce to
>
> mba
> mba
> mbanew
>
> You can look at the actual indexed terms by the admin/schema-browser as well.
>
> That said, unless you transposed the order because you were
> concentrating on the numeric part, the doc with MB20140410A-New should
> always be sorting last.
>
> All of which is irrelevant if you're doing something else with
> "alphaOnlySort", so please paste in the fieldType definition if you've
> changed it.
>
> What gets returned in the doc for _stored_ data is a verbatim copy,
> NOT the output of the analysis chain, which can be confusing.
>
> Oh, and Solr uses the internal lucene doc ID to break ties, and docs
> on different replicas can have different internal Lucene doc IDs
> relative to each other as a result of merging so that's something else
> to watch out for.
>
> Best,
> Erick
>
> On Wed, Apr 30, 2014 at 1:06 PM, Francois Perron
> <Fr...@ticketmaster.com> wrote:
>> Hi guys,
>>
>>   I have a small SolrCloud setup (3 servers, 1 collection with 1 shard and 3 replicat).  In my schema, I have a alphaOnlySort field with a copyfield.
>>
>> This is a part of my managed-schema :
>>
>>     <field name="_root_" type="string" indexed="true" stored="false"/>
>>     <field name="_uid" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
>>     <field name="_version_" type="long" indexed="true" stored="true"/>
>>     <field name="event_id" type="string" indexed="true" stored="true"/>
>>     <field name="event_name" type="text_general" indexed="true" stored="true"/>
>>     <field name="event_name_sort" type="alphaOnlySort"/>
>>
>> with the copyfield
>>
>>   <copyField source="event_name" dest="event_name_sort"/>
>>
>>
>> The problem is : I query my collection with a sort on my alphasort field but on one of my servers, the sort order is not the same.
>>
>> On server 1 and 2, I have this result :
>>
>> <doc>
>> <str name="event_name">MB20140410A</str>
>> </doc>
>> <doc>
>> <str name="event_name">MB20140410A-New</str>
>> </doc>
>> <doc>
>> <str name="event_name">MB20140411A</str>
>> </doc>
>>
>>
>>
>> and on the third one, this :
>>
>> <str name="event_name">MB20140410A</str>
>> </doc>
>> <doc>
>> <str name="event_name">MB20140411A</str>
>> </doc>
>> <doc>
>> <str name="event_name">MB20140410A-New</str>
>> </doc>
>>
>>
>> The doc named "MB20140411A" should be at the end ...
>>
>> Any idea ?
>>
>> Regards

RE : Shards don't return documents in same order

Posted by Francois Perron <Fr...@Ticketmaster.com>.
Hi Erick,

  thank you for your response.  You are right, I changed alphaOnlySort to keep lettres and numbers and to remove some acticles (a, an, the).

This is the filetype definition :

    <fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.TrimFilterFactory"/>
        <filter class="solr.PatternReplaceFilterFactory" replace="all" replacement="" pattern="(\b(a|an|the)\b|[^a-z,0-9])"/>
      </analyzer>
    </fieldType>


Then, I tested each name with admin ui on each server and this is the results :

server1

MB20140410A = mb20140410a
MB20140411A = mb20140411a
MB20140410A-New = mb20140410anew

server2

MB20140410A = mb20140410a
MB20140411A = mb20140411a
MB20140410A-New = mb20140410anew

server3

MB20140410A = mb20140410a
MB20140411A = mb20140411a
MB20140410A-New = mb20140410anew

"Unfortunately", all results are identical so is there a mean to view data real indexed in these documents ?  Can be a problem with a particular server ?  All configs are in zookeeper so all cores shouldhave the same config, right ?  Is there any way to force a replicat to resynchronize ?

Regards,

Francois.

________________________________________
De : Erick Erickson [erickerickson@gmail.com]
Envoyé : 30 avril 2014 16:36
À : solr-user@lucene.apache.org
Objet : Re: Shards don't return documents in same order

Hmmm, take a look at the admin/analysis page for these inputs for
alphaOnlySort. If you're using the stock Solr distro, you're probably
not considering the effects patternReplaceFilterFactory which is
removing all non-letters. So these three terms reduce to

mba
mba
mbanew

You can look at the actual indexed terms by the admin/schema-browser as well.

That said, unless you transposed the order because you were
concentrating on the numeric part, the doc with MB20140410A-New should
always be sorting last.

All of which is irrelevant if you're doing something else with
"alphaOnlySort", so please paste in the fieldType definition if you've
changed it.

What gets returned in the doc for _stored_ data is a verbatim copy,
NOT the output of the analysis chain, which can be confusing.

Oh, and Solr uses the internal lucene doc ID to break ties, and docs
on different replicas can have different internal Lucene doc IDs
relative to each other as a result of merging so that's something else
to watch out for.

Best,
Erick

On Wed, Apr 30, 2014 at 1:06 PM, Francois Perron
<Fr...@ticketmaster.com> wrote:
> Hi guys,
>
>   I have a small SolrCloud setup (3 servers, 1 collection with 1 shard and 3 replicat).  In my schema, I have a alphaOnlySort field with a copyfield.
>
> This is a part of my managed-schema :
>
>     <field name="_root_" type="string" indexed="true" stored="false"/>
>     <field name="_uid" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
>     <field name="_version_" type="long" indexed="true" stored="true"/>
>     <field name="event_id" type="string" indexed="true" stored="true"/>
>     <field name="event_name" type="text_general" indexed="true" stored="true"/>
>     <field name="event_name_sort" type="alphaOnlySort"/>
>
> with the copyfield
>
>   <copyField source="event_name" dest="event_name_sort"/>
>
>
> The problem is : I query my collection with a sort on my alphasort field but on one of my servers, the sort order is not the same.
>
> On server 1 and 2, I have this result :
>
> <doc>
> <str name="event_name">MB20140410A</str>
> </doc>
> <doc>
> <str name="event_name">MB20140410A-New</str>
> </doc>
> <doc>
> <str name="event_name">MB20140411A</str>
> </doc>
>
>
>
> and on the third one, this :
>
> <str name="event_name">MB20140410A</str>
> </doc>
> <doc>
> <str name="event_name">MB20140411A</str>
> </doc>
> <doc>
> <str name="event_name">MB20140410A-New</str>
> </doc>
>
>
> The doc named "MB20140411A" should be at the end ...
>
> Any idea ?
>
> Regards

Re: Shards don't return documents in same order

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, take a look at the admin/analysis page for these inputs for
alphaOnlySort. If you're using the stock Solr distro, you're probably
not considering the effects patternReplaceFilterFactory which is
removing all non-letters. So these three terms reduce to

mba
mba
mbanew

You can look at the actual indexed terms by the admin/schema-browser as well.

That said, unless you transposed the order because you were
concentrating on the numeric part, the doc with MB20140410A-New should
always be sorting last.

All of which is irrelevant if you're doing something else with
"alphaOnlySort", so please paste in the fieldType definition if you've
changed it.

What gets returned in the doc for _stored_ data is a verbatim copy,
NOT the output of the analysis chain, which can be confusing.

Oh, and Solr uses the internal lucene doc ID to break ties, and docs
on different replicas can have different internal Lucene doc IDs
relative to each other as a result of merging so that's something else
to watch out for.

Best,
Erick

On Wed, Apr 30, 2014 at 1:06 PM, Francois Perron
<Fr...@ticketmaster.com> wrote:
> Hi guys,
>
>   I have a small SolrCloud setup (3 servers, 1 collection with 1 shard and 3 replicat).  In my schema, I have a alphaOnlySort field with a copyfield.
>
> This is a part of my managed-schema :
>
>     <field name="_root_" type="string" indexed="true" stored="false"/>
>     <field name="_uid" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
>     <field name="_version_" type="long" indexed="true" stored="true"/>
>     <field name="event_id" type="string" indexed="true" stored="true"/>
>     <field name="event_name" type="text_general" indexed="true" stored="true"/>
>     <field name="event_name_sort" type="alphaOnlySort"/>
>
> with the copyfield
>
>   <copyField source="event_name" dest="event_name_sort"/>
>
>
> The problem is : I query my collection with a sort on my alphasort field but on one of my servers, the sort order is not the same.
>
> On server 1 and 2, I have this result :
>
> <doc>
> <str name="event_name">MB20140410A</str>
> </doc>
> <doc>
> <str name="event_name">MB20140410A-New</str>
> </doc>
> <doc>
> <str name="event_name">MB20140411A</str>
> </doc>
>
>
>
> and on the third one, this :
>
> <str name="event_name">MB20140410A</str>
> </doc>
> <doc>
> <str name="event_name">MB20140411A</str>
> </doc>
> <doc>
> <str name="event_name">MB20140410A-New</str>
> </doc>
>
>
> The doc named "MB20140411A" should be at the end ...
>
> Any idea ?
>
> Regards