You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Ulicny <cu...@iq.media> on 2017/08/03 12:30:28 UTC

Get handler failure

Hi all,

I've run into an issue in a test environment where a document exists, but
fails to be retrieved consistently by /get requests. In a series of 10
requests for the specific document across a few minute timespan, one of the
middle requests returned a null document.

Currently, nothing is updating existing records in the collection, so it
couldn't have actually been deleted.

The test cloud and collection have 3 nodes, 6 shards, and 1 replica per
shard. Based on the fact that the node that was queried was not the node
the document resided on, I assume that there may have been a temporary
connectivity issue that we're unaware of and the request couldn't find the
document and returned null.

So is that a possibility, and are there any other circumstances where the
/get handler would not be able to return a document that exists in a
collection?

Thanks,
Chris

Re: Get handler failure

Posted by Chris Ulicny <cu...@iq.media>.
By 1 replica, I mean a single copy of the shard with no redundancy.

We haven't encountered any problems with the testing environment solr
instances, that weren't expected. At least that I'm aware of.

I do have the logs saved from the time frame the issue occurred in if those
would be useful. We're running Solr 6.3.0 on Ubuntu 16.04 virtual machines.

On Thu, Aug 3, 2017 at 9:18 AM Shawn Heisey <ap...@elyograg.org> wrote:

> On 8/3/2017 6:30 AM, Chris Ulicny wrote:
> > I've run into an issue in a test environment where a document exists, but
> > fails to be retrieved consistently by /get requests. In a series of 10
> > requests for the specific document across a few minute timespan, one of
> the
> > middle requests returned a null document.
> >
> > Currently, nothing is updating existing records in the collection, so it
> > couldn't have actually been deleted.
> >
> > The test cloud and collection have 3 nodes, 6 shards, and 1 replica per
> > shard. Based on the fact that the node that was queried was not the node
> > the document resided on, I assume that there may have been a temporary
> > connectivity issue that we're unaware of and the request couldn't find
> the
> > document and returned null.
>
> When you say "1 replica" do you mean that there are two copies of each
> shard (leader and replica) or one copy (no redundancy)?  I ask because
> this is a common point of confusion about SolrCloud terminology.  If you
> have two copies, then you have two replicas -- because the leader IS a
> replica.
>
> If there are two copies, you might be in a situation where the two
> copies are out of sync for some reason, and one copy has the document
> but the other doesn't.  Because SolrCloud load balances requests,
> sometimes the query will be serviced by one copy, sometimes by the other.
>
> If there is only one copy of each shard, then I do not know how this
> could happen, and it might indicate some kind of a problem with your
> install.
>
> Thanks,
> Shawn
>
>

Re: Get handler failure

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/3/2017 6:30 AM, Chris Ulicny wrote:
> I've run into an issue in a test environment where a document exists, but
> fails to be retrieved consistently by /get requests. In a series of 10
> requests for the specific document across a few minute timespan, one of the
> middle requests returned a null document.
>
> Currently, nothing is updating existing records in the collection, so it
> couldn't have actually been deleted.
>
> The test cloud and collection have 3 nodes, 6 shards, and 1 replica per
> shard. Based on the fact that the node that was queried was not the node
> the document resided on, I assume that there may have been a temporary
> connectivity issue that we're unaware of and the request couldn't find the
> document and returned null.

When you say "1 replica" do you mean that there are two copies of each
shard (leader and replica) or one copy (no redundancy)?  I ask because
this is a common point of confusion about SolrCloud terminology.  If you
have two copies, then you have two replicas -- because the leader IS a
replica.

If there are two copies, you might be in a situation where the two
copies are out of sync for some reason, and one copy has the document
but the other doesn't.  Because SolrCloud load balances requests,
sometimes the query will be serviced by one copy, sometimes by the other.

If there is only one copy of each shard, then I do not know how this
could happen, and it might indicate some kind of a problem with your
install.

Thanks,
Shawn