You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Babak Farhang <fa...@gmail.com> on 2010/07/02 10:05:31 UTC
Re: questions about Solr shards

Thanks Joe. This is all very interesting. So though it helps us scale,
sharding doesn't come cheap.

On Mon, Jun 28, 2010 at 9:50 AM, Joe Calderon <ca...@gmail.com> wrote:
> there is a first pass query to retrieve all matching document ids from
> every shard along with relevant sorting information, the document ids
> are then sorted and limited to the amount needed, then a second query
> is sent for the rest of the documents metadata.
>
> On Sun, Jun 27, 2010 at 7:32 PM, Babak Farhang <fa...@gmail.com> wrote:
>> Otis,
>>
>> Belated thanks for your reply.
>>
>>>> 2. "The index could change between stages, e.g. a
>>>> document that matched a
>>>> query and was subsequently changed may no
>>>> longer match but will still be
>>>> retrieved."
>>
>>> 2. This describes the situation where, for instance, a
>>> document with ID=10 is updated between the 2 calls
>>> to the Solr instance/shard where that doc ID=10 lives.
>>
>> Can you explain why this happens? (I.e. does each query to the sharded
>> index somehow involve 2 calls to each shard instance from the base
>> instance?)
>>
>> -Babak
>>
>> On Thu, Jun 24, 2010 at 10:14 PM, Otis Gospodnetic
>> <ot...@yahoo.com> wrote:
>>> Hi Babak,
>>>
>>> 1. Yes, you are reading that correctly.
>>>
>>> 2. This describes the situation where, for instance, a document with ID=10 is updated between the 2 calls to the Solr instance/shard where that doc ID=10 lives.
>>>
>>> 3. Yup, orthogonal.  You can have a master with multiple cores for sharded and non-sharded indices and you can have a slave with cores that hold complete indices or just their shards.
>>>  Otis
>>> ----
>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>> Lucene ecosystem search :: http://search-lucene.com/
>>>
>>>
>>>
>>> ----- Original Message ----
>>>> From: Babak Farhang <fa...@gmail.com>
>>>> To: solr-user@lucene.apache.org
>>>> Sent: Thu, June 24, 2010 6:32:54 PM
>>>> Subject: questions about Solr shards
>>>>
>>>> Hi everyone,
>>>
>>> There are a couple of notes on the limitations of this
>>>> approach at
>>>
>>>> target=_blank >http://wiki.apache.org/solr/DistributedSearch which I'm
>>>> having trouble
>>> understanding.
>>>
>>> 1. "When duplicate doc IDs are received,
>>>> Solr chooses the first doc
>>>   and discards subsequent
>>>> ones"
>>>
>>> "Received" here is from the perspective of the base Solr instance
>>>> at
>>> query time, right?  I.e. if you inadvertently indexed 2 versions
>>>> of
>>> the document with the same unique ID but different contents to
>>>> 2
>>> shards, then at query time, the "first" document (putting aside for
>>> the
>>>> moment what exactly "first" means) would win.  Am I reading
>>>> this
>>> right?
>>>
>>>
>>> 2. "The index could change between stages, e.g. a
>>>> document that matched a
>>>   query and was subsequently changed may no
>>>> longer match but will still be
>>>   retrieved."
>>>
>>> I have no idea what
>>>> this second statement means.
>>>
>>>
>>> And one other question about
>>>> shards:
>>>
>>> 3. The examples I've seen documented do not illustrate
>>>> sharded,
>>> multicore setups; only sharded monolithic cores.  I assume
>>>> sharding
>>> works with multicore as well (i.e. the two issues are
>>>> orthogonal).  Is
>>> this right?
>>>
>>>
>>> Any help on interpreting the
>>>> above would be much appreciated.
>>>
>>> Thank you,
>>> -Babak
>>>
>>
>