You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Troullis <cp...@gmail.com> on 2017/05/14 14:10:19 UTC

Seeing odd behavior with implicit routing

Hi,

I've been experimenting with various sharding strategies with Solr cloud
(6.5.1), and am seeing some odd behavior when using the implicit router. I
am probably either doing something wrong or misinterpreting what I am
seeing in the logs, but if someone could help clarify that would be awesome.

I created a collection using the implicit router, created 10 shards, named
shard1, shard2, etc. I indexed 3000 documents to each shard, routed by
setting the _route_ field on the documents in my schema. All works fine, I
verified there are 3000 documents in each shard.

The odd behavior I am seeing is when I try to route a query to a specific
shard. I submitted a simple query to shard1 using the request parameter
_route_=shard1. The query comes back fine, but when I looked in the logs,
it looked like it was issuing 3 separate requests:

1. The original query to shard1
2. A 2nd query to shard1 with the parameter ids=a bunch of document ids
3. The original query to a random shard (changes every time I run the query)

It looks like the first query is getting back a list of ids, and the 2nd
query is retrieving the documents for those ids? I assume this is some solr
cloud implementation detail.

What I don't understand is the 3rd query. Why is it issuing the original
query to a random shard every time, when I am specifying the _route_? The
_route_ parameter is definitely doing something, because if I remove it, it
is querying all shards (which I would expect).

Any ideas? I can provide the actual queries from the logs if required.

Thanks,

Chris

Re: Seeing odd behavior with implicit routing

Posted by Chris Troullis <cp...@gmail.com>.
Shalin,

Thanks for the response and explanation! I logged a JIRA per your request
here: https://issues.apache.org/jira/browse/SOLR-10695

Chris


On Mon, May 15, 2017 at 3:40 AM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> On Sun, May 14, 2017 at 7:40 PM, Chris Troullis <cp...@gmail.com>
> wrote:
> > Hi,
> >
> > I've been experimenting with various sharding strategies with Solr cloud
> > (6.5.1), and am seeing some odd behavior when using the implicit router.
> I
> > am probably either doing something wrong or misinterpreting what I am
> > seeing in the logs, but if someone could help clarify that would be
> awesome.
> >
> > I created a collection using the implicit router, created 10 shards,
> named
> > shard1, shard2, etc. I indexed 3000 documents to each shard, routed by
> > setting the _route_ field on the documents in my schema. All works fine,
> I
> > verified there are 3000 documents in each shard.
> >
> > The odd behavior I am seeing is when I try to route a query to a specific
> > shard. I submitted a simple query to shard1 using the request parameter
> > _route_=shard1. The query comes back fine, but when I looked in the logs,
> > it looked like it was issuing 3 separate requests:
> >
> > 1. The original query to shard1
> > 2. A 2nd query to shard1 with the parameter ids=a bunch of document ids
> > 3. The original query to a random shard (changes every time I run the
> query)
> >
> > It looks like the first query is getting back a list of ids, and the 2nd
> > query is retrieving the documents for those ids? I assume this is some
> solr
> > cloud implementation detail.
> >
> > What I don't understand is the 3rd query. Why is it issuing the original
> > query to a random shard every time, when I am specifying the _route_? The
> > _route_ parameter is definitely doing something, because if I remove it,
> it
> > is querying all shards (which I would expect).
> >
> > Any ideas? I can provide the actual queries from the logs if required.
>
> How many nodes is this collection distributed across? I suspect that
> you are using a single node for experimentation?
>
> What happens with _route_=shard1 parameter and implicit routing is
> that the _route_ parameter is resolved to a list of replicas of
> shard1. But, SolrJ uses only the node name of the replica along with
> the collection name to make the request (this is important, we'll come
> back to this later). So, ordinarily, that node hosts a single shard
> (shard1) and when it receives the request, it will optimize the search
> to go the non-distributed code path (since the replica has all the
> data needed to satisfy the search).
>
> But interesting things happen when the node hosts more than one shard
> (say shard1 and shard3 both). When we query such a node using just the
> collection name, the collection name can be resolved to either shard1
> or shard3 -- this is picked randomly without looking at _route_
> parameter at all. If shard3 is picked, it looks at the request, sees
> that it doesn't have all the necessary data and decides to follow the
> two-phase distributed search path where phase 1 is to get the ids and
> score of the documents matching the query from all participating
> shards (the list of such shards is limited by _route_ parameter, which
> in our case will be only shard1) and a second phase where we get the
> actual stored fields to be returned to the user. So you get three
> queries in the log, 1) phase 1 of distributed search hitting shard1,
> 2) phase two of distributed search hitting shard1 and 3) the
> distributed scatter-gather search run by shard3.
>
> So to recap, this is happening because you have more than one shard1
> hosted on a node. Easy workaround is to have each shard hosted on a
> unique node. But we can improve things on the solr side as well by 1)
> having SolrJ resolve requests down to node name and core name, 2)
> having the collection name to core name resolution take _route_ param
> into account. Both 1 and 2 can solve the problem. Can you please open
> a Jira issue?
>
> >
> > Thanks,
> >
> > Chris
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Seeing odd behavior with implicit routing

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Sun, May 14, 2017 at 7:40 PM, Chris Troullis <cp...@gmail.com> wrote:
> Hi,
>
> I've been experimenting with various sharding strategies with Solr cloud
> (6.5.1), and am seeing some odd behavior when using the implicit router. I
> am probably either doing something wrong or misinterpreting what I am
> seeing in the logs, but if someone could help clarify that would be awesome.
>
> I created a collection using the implicit router, created 10 shards, named
> shard1, shard2, etc. I indexed 3000 documents to each shard, routed by
> setting the _route_ field on the documents in my schema. All works fine, I
> verified there are 3000 documents in each shard.
>
> The odd behavior I am seeing is when I try to route a query to a specific
> shard. I submitted a simple query to shard1 using the request parameter
> _route_=shard1. The query comes back fine, but when I looked in the logs,
> it looked like it was issuing 3 separate requests:
>
> 1. The original query to shard1
> 2. A 2nd query to shard1 with the parameter ids=a bunch of document ids
> 3. The original query to a random shard (changes every time I run the query)
>
> It looks like the first query is getting back a list of ids, and the 2nd
> query is retrieving the documents for those ids? I assume this is some solr
> cloud implementation detail.
>
> What I don't understand is the 3rd query. Why is it issuing the original
> query to a random shard every time, when I am specifying the _route_? The
> _route_ parameter is definitely doing something, because if I remove it, it
> is querying all shards (which I would expect).
>
> Any ideas? I can provide the actual queries from the logs if required.

How many nodes is this collection distributed across? I suspect that
you are using a single node for experimentation?

What happens with _route_=shard1 parameter and implicit routing is
that the _route_ parameter is resolved to a list of replicas of
shard1. But, SolrJ uses only the node name of the replica along with
the collection name to make the request (this is important, we'll come
back to this later). So, ordinarily, that node hosts a single shard
(shard1) and when it receives the request, it will optimize the search
to go the non-distributed code path (since the replica has all the
data needed to satisfy the search).

But interesting things happen when the node hosts more than one shard
(say shard1 and shard3 both). When we query such a node using just the
collection name, the collection name can be resolved to either shard1
or shard3 -- this is picked randomly without looking at _route_
parameter at all. If shard3 is picked, it looks at the request, sees
that it doesn't have all the necessary data and decides to follow the
two-phase distributed search path where phase 1 is to get the ids and
score of the documents matching the query from all participating
shards (the list of such shards is limited by _route_ parameter, which
in our case will be only shard1) and a second phase where we get the
actual stored fields to be returned to the user. So you get three
queries in the log, 1) phase 1 of distributed search hitting shard1,
2) phase two of distributed search hitting shard1 and 3) the
distributed scatter-gather search run by shard3.

So to recap, this is happening because you have more than one shard1
hosted on a node. Easy workaround is to have each shard hosted on a
unique node. But we can improve things on the solr side as well by 1)
having SolrJ resolve requests down to node name and core name, 2)
having the collection name to core name resolution take _route_ param
into account. Both 1 and 2 can solve the problem. Can you please open
a Jira issue?

>
> Thanks,
>
> Chris



-- 
Regards,
Shalin Shekhar Mangar.