You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by IJ <ja...@gmail.com> on 2014/07/01 18:42:49 UTC

Does Solr move documents between shards when the value of the shard key is updated ?

Lets say I create a Solr Collection with multiple shards (say 2 shards) and
set the value of "router.field" to a field called "CompanyName". Now - we
all know that during Indexing Solr would compute a hash on the value indexed
into the "CompanyName" and route to an appropriate shard.

Lets say I index a document into this Collection - and Solr routes the
document into Shard 1 (based on the computed Hash). Now, lets say - I
re-index the same document (same unique key) - but with a different value of
the "CompanyName" - and lets say the Solr now determines that the document
should route to Shard 2 - In such a situation - would solr delete the older
version of the document from Shard 1 ? OR would I end up with two versions
of the same Document (same unique key) in both shards ?

My system allows updates to fields that I choose as the shard key. I
definitely want the document to be moved from Shard 1 into Shard 2 when i
perform the re-indexing. Would this work as expected ? OR should I be doing
an explicit delete prior to re-indexing such documents ??



--
View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-move-documents-between-shards-when-the-value-of-the-shard-key-is-updated-tp4145043.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Does Solr move documents between shards when the value of the shard key is updated ?

Posted by Erick Erickson <er...@gmail.com>.
bq: Is this a BUG or a FEATURE in Solr

How about "just the way it works"?

You've changed the route key with the same
unique key, taking control of the routing.

When you change that routing, how is Solr to
know where the _old_ document lived? It would
have to, say, query the entire cluster for any doc
that had the given <uniqueKey> and delete it,
something that'd be horribly slow.

As to your follow-up question, I'm not totally sure.
I believe the delete is sent to all shards, but why
don't you test to see?

Best,
Erick


On Wed, Jul 2, 2014 at 10:22 AM, IJ <ja...@gmail.com> wrote:
> So - we do end up with two copies / versions of the same document (uniqueid)
> - one in each of the two shards - Is this a BUG or a FEATURE in Solr ?
>
> Have a follow up question - In case one were to attempt to delete the
> document -lets say usng the CloudSolrServer - deleteById() API - would that
> attempt to delete the document in both (or all) shards ? How would Solr
> determine which shard / shards to run the delete against ?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-move-documents-between-shards-when-the-value-of-the-shard-key-is-updated-tp4145043p4145237.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Does Solr move documents between shards when the value of the shard key is updated ?

Posted by IJ <ja...@gmail.com>.
So - we do end up with two copies / versions of the same document (uniqueid)
- one in each of the two shards - Is this a BUG or a FEATURE in Solr ?

Have a follow up question - In case one were to attempt to delete the
document -lets say usng the CloudSolrServer - deleteById() API - would that
attempt to delete the document in both (or all) shards ? How would Solr
determine which shard / shards to run the delete against ?



--
View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-move-documents-between-shards-when-the-value-of-the-shard-key-is-updated-tp4145043p4145237.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Does Solr move documents between shards when the value of the shard key is updated ?

Posted by Erick Erickson <er...@gmail.com>.
You would end up with duplicate docs on the two shards.

Solr is doing its doc-id lookup on the shards, not on
other shards. Routing takes place before this step,
so you're going to have two docs.

Best,
Erick

On Tue, Jul 1, 2014 at 9:42 AM, IJ <ja...@gmail.com> wrote:
> Lets say I create a Solr Collection with multiple shards (say 2 shards) and
> set the value of "router.field" to a field called "CompanyName". Now - we
> all know that during Indexing Solr would compute a hash on the value indexed
> into the "CompanyName" and route to an appropriate shard.
>
> Lets say I index a document into this Collection - and Solr routes the
> document into Shard 1 (based on the computed Hash). Now, lets say - I
> re-index the same document (same unique key) - but with a different value of
> the "CompanyName" - and lets say the Solr now determines that the document
> should route to Shard 2 - In such a situation - would solr delete the older
> version of the document from Shard 1 ? OR would I end up with two versions
> of the same Document (same unique key) in both shards ?
>
> My system allows updates to fields that I choose as the shard key. I
> definitely want the document to be moved from Shard 1 into Shard 2 when i
> perform the re-indexing. Would this work as expected ? OR should I be doing
> an explicit delete prior to re-indexing such documents ??
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-move-documents-between-shards-when-the-value-of-the-shard-key-is-updated-tp4145043.html
> Sent from the Solr - User mailing list archive at Nabble.com.