You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Ulicny <cu...@iq.media> on 2018/03/14 15:26:50 UTC

Implications of using implicit routing

Hi all,

We've been looking at using implicit for one of our collections, and there
seems to be some weird behavior that we're not sure whether it was expected
or not.

Is it recommended to use a uniqueKey for implicit routing? Is the following
behavior intended?

We have encountered the following issue. Create a collection with two
shards (S1,S2), implicit routing, with "id" as uniqueKey, and router.field
as "routingfield". If we index

{"id":"id1","routingfield":"S1"}

It goes into shard S1. Then if we need to reindex the document with a
different "routingfield" value:

{"id":"id1","routingfield":"S2"}

It goes into shard S2. However, when you select the document in a query, it
seems that both of those documents exist, but get deduped on return since
selecting all documents only ever returns a single document. Adding [shard]
to the fl list results in the document coming from S1 some of the time and
S2 the rest.

Trying to use /get with just the id results in a NullReferenceException.
Adding the _route_ parameter in works, but both documents can be retrieved.

Thanks,
Chris

Re: Implications of using implicit routing

Posted by Chris Ulicny <cu...@iq.media>.
Shawn,

I knew that the shard had to be specified by the indexing process or
document, but I didn't realize that the uniqueness of the document across
the collection also had to be handled outside of solr as well.

We've used the compositeId router successfully to route documents, but it
seemed that the implicit/manual routing might work for this new collection.
Apparently not based on the requirement of the indexing processes to
enforce uniqueness as well as distribution.

Thanks for the help.
Chris

On Wed, Mar 14, 2018 at 11:39 AM Shawn Heisey <el...@elyograg.org> wrote:

> On 3/14/2018 9:26 AM, Chris Ulicny wrote:
> > We've been looking at using implicit for one of our collections, and
> there
> > seems to be some weird behavior that we're not sure whether it was
> expected
> > or not.
> >
> > Is it recommended to use a uniqueKey for implicit routing? Is the
> following
> > behavior intended?
> >
> > We have encountered the following issue. Create a collection with two
> > shards (S1,S2), implicit routing, with "id" as uniqueKey, and
> router.field
> > as "routingfield". If we index
> >
> > {"id":"id1","routingfield":"S1"}
> >
> > It goes into shard S1. Then if we need to reindex the document with a
> > different "routingfield" value:
> >
> > {"id":"id1","routingfield":"S2"}
> >
> > It goes into shard S2. However, when you select the document in a query,
> it
> > seems that both of those documents exist, but get deduped on return since
> > selecting all documents only ever returns a single document. Adding
> [shard]
> > to the fl list results in the document coming from S1 some of the time
> and
> > S2 the rest.
> >
> > Trying to use /get with just the id results in a NullReferenceException.
> > Adding the _route_ parameter in works, but both documents can be
> retrieved.
>
> This is a common misconception with the implicit router. That name is a
> completely correct summary of what the router does, but it is one of
> those "overloaded" words in the English language that is often not
> completely understood.
>
> A better name for "implicit" would actually be "manual." By using this
> router, you have told Solr not to worry about routing -- that you're
> going to handle it, and that you're going to make sure every document is
> unique across all shards.  Then you indexed the same document to two
> shards -- intentionally.  Solr isn't going to prevent that -- there's
> nothing it can do to prevent it without making all indexing a LOT slower.
>
> If you want Solr to handle routing for you, then you must use the
> compositeId router.  With that router, you do not get to specify which
> shard contains your document, and you cannot add shards after the
> collection is created.  Later you can SPLIT shards, but you can't add them.
>
> Thanks,
> Shawn
>
>

Re: Implications of using implicit routing

Posted by Shawn Heisey <el...@elyograg.org>.
On 3/14/2018 9:26 AM, Chris Ulicny wrote:
> We've been looking at using implicit for one of our collections, and there
> seems to be some weird behavior that we're not sure whether it was expected
> or not.
>
> Is it recommended to use a uniqueKey for implicit routing? Is the following
> behavior intended?
>
> We have encountered the following issue. Create a collection with two
> shards (S1,S2), implicit routing, with "id" as uniqueKey, and router.field
> as "routingfield". If we index
>
> {"id":"id1","routingfield":"S1"}
>
> It goes into shard S1. Then if we need to reindex the document with a
> different "routingfield" value:
>
> {"id":"id1","routingfield":"S2"}
>
> It goes into shard S2. However, when you select the document in a query, it
> seems that both of those documents exist, but get deduped on return since
> selecting all documents only ever returns a single document. Adding [shard]
> to the fl list results in the document coming from S1 some of the time and
> S2 the rest.
>
> Trying to use /get with just the id results in a NullReferenceException.
> Adding the _route_ parameter in works, but both documents can be retrieved.

This is a common misconception with the implicit router. That name is a 
completely correct summary of what the router does, but it is one of 
those "overloaded" words in the English language that is often not 
completely understood.

A better name for "implicit" would actually be "manual." By using this 
router, you have told Solr not to worry about routing -- that you're 
going to handle it, and that you're going to make sure every document is 
unique across all shards.  Then you indexed the same document to two 
shards -- intentionally.  Solr isn't going to prevent that -- there's 
nothing it can do to prevent it without making all indexing a LOT slower.

If you want Solr to handle routing for you, then you must use the 
compositeId router.  With that router, you do not get to specify which 
shard contains your document, and you cannot add shards after the 
collection is created.  Later you can SPLIT shards, but you can't add them.

Thanks,
Shawn