You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jamie Johnson <je...@gmail.com> on 2012/04/05 06:19:05 UTC

Re: SolrCloud replica and leader out of Sync somehow

Not sure if this got lost in the shuffle, were there any thoughts on this?

On Wed, Mar 21, 2012 at 11:02 AM, Jamie Johnson <je...@gmail.com> wrote:
> Given that in a distributed environment the docids are not guaranteed
> to be the same across shards should the sorting use the uniqueId field
> as the tie breaker by default?
>
> On Tue, Mar 20, 2012 at 2:10 PM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> On Tue, Mar 20, 2012 at 2:02 PM, Jamie Johnson <je...@gmail.com> wrote:
>>> I'll try to dig for the JIRA.  Also I'm assuming this could happen on
>>> any sort, not just score correct?  Meaning if we sorted by a date
>>> field and there were duplicates in that date field order wouldn't be
>>> guaranteed for the same reasons right?
>>
>> Correct - internal docid is the tiebreaker for all sorts.
>>
>> -Yonik
>> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
>> Boston May 7-10

Re: SolrCloud replica and leader out of Sync somehow

Posted by Jamie Johnson <je...@gmail.com>.
awesome Yonik.  I'll indeed try this.  Thanks!

On Thu, Apr 5, 2012 at 10:20 AM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Thu, Apr 5, 2012 at 12:19 AM, Jamie Johnson <je...@gmail.com> wrote:
>> Not sure if this got lost in the shuffle, were there any thoughts on this?
>
> Sorting by "id" could be pretty expensive (memory-wise), so I don't
> think it should be default or anything.
> We also need a way for a client to hit the same set of servers again
> anyway (to handle other possible variations like commit time).
>
> To handle the tiebreak stuff, you could also sort by _version_ - that
> should be unique in an index and is already used under the covers and
> hence shouldn't add any extra memory overhead.  versions increase over
> time, so "_version desc" should give you newer documents first.
>
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10
>
>
>
>
>> On Wed, Mar 21, 2012 at 11:02 AM, Jamie Johnson <je...@gmail.com> wrote:
>>> Given that in a distributed environment the docids are not guaranteed
>>> to be the same across shards should the sorting use the uniqueId field
>>> as the tie breaker by default?
>>>
>>> On Tue, Mar 20, 2012 at 2:10 PM, Yonik Seeley
>>> <yo...@lucidimagination.com> wrote:
>>>> On Tue, Mar 20, 2012 at 2:02 PM, Jamie Johnson <je...@gmail.com> wrote:
>>>>> I'll try to dig for the JIRA.  Also I'm assuming this could happen on
>>>>> any sort, not just score correct?  Meaning if we sorted by a date
>>>>> field and there were duplicates in that date field order wouldn't be
>>>>> guaranteed for the same reasons right?
>>>>
>>>> Correct - internal docid is the tiebreaker for all sorts.
>>>>
>>>> -Yonik
>>>> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
>>>> Boston May 7-10

Re: SolrCloud replica and leader out of Sync somehow

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Thu, Apr 5, 2012 at 12:19 AM, Jamie Johnson <je...@gmail.com> wrote:
> Not sure if this got lost in the shuffle, were there any thoughts on this?

Sorting by "id" could be pretty expensive (memory-wise), so I don't
think it should be default or anything.
We also need a way for a client to hit the same set of servers again
anyway (to handle other possible variations like commit time).

To handle the tiebreak stuff, you could also sort by _version_ - that
should be unique in an index and is already used under the covers and
hence shouldn't add any extra memory overhead.  versions increase over
time, so "_version desc" should give you newer documents first.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10




> On Wed, Mar 21, 2012 at 11:02 AM, Jamie Johnson <je...@gmail.com> wrote:
>> Given that in a distributed environment the docids are not guaranteed
>> to be the same across shards should the sorting use the uniqueId field
>> as the tie breaker by default?
>>
>> On Tue, Mar 20, 2012 at 2:10 PM, Yonik Seeley
>> <yo...@lucidimagination.com> wrote:
>>> On Tue, Mar 20, 2012 at 2:02 PM, Jamie Johnson <je...@gmail.com> wrote:
>>>> I'll try to dig for the JIRA.  Also I'm assuming this could happen on
>>>> any sort, not just score correct?  Meaning if we sorted by a date
>>>> field and there were duplicates in that date field order wouldn't be
>>>> guaranteed for the same reasons right?
>>>
>>> Correct - internal docid is the tiebreaker for all sorts.
>>>
>>> -Yonik
>>> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
>>> Boston May 7-10