You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Dennis Gove <dp...@gmail.com> on 2016/02/10 17:55:26 UTC

ZK Connection Failure leads to stale data

Just wanted to take a moment to get anyone's thoughts on the following
issues

https://issues.apache.org/jira/browse/SOLR-8599
https://issues.apache.org/jira/browse/SOLR-8666

The originating problem occurred due to a DNS failure that caused some
nodes in a cloud setup to fail to connect to zookeeper. Those nodes were
running but were not participating in the cloud with the other nodes. The
disconnected nodes would respond to queries with stale data, though they
would reject injest requests.

Ticket https://issues.apache.org/jira/browse/SOLR-8599 contains a patch
which ensures that if a connection to zookeeper fails to be made it will be
retried. Previously the failure wasn't leading to a retry so the node would
just run and be disconnect until the node itself was restarted.

Ticket https://issues.apache.org/jira/browse/SOLR-8666 contains a patch
which will result in additional information returned to the client when a
node may be returning stale data due to not being connected to zookeeper.
The intent was to not change current behavior but allow the client to know
that something might be wrong. In situations where the collection is not
being updated the data may not be stale so it wouldn't matter if the node
is disconnected from zookeeper but in situations where the collection is
being updated then the data may be stale. The headers of the response will
now contain an entry to indicate this. Also, adds a header to the ping
response to also provide notification if the node is disconnected from
zookeeper.

I think the approach these patches take are good but wanted to get others'
thoughts and perhaps I'm missing a case where these might cause a problem.

Thanks - Dennis

Re: ZK Connection Failure leads to stale data

Posted by Dennis Gove <dp...@gmail.com>.
David,

Thanks - I've been trying to get these in for a bit but have been running
into inconsistent test results described in
https://issues.apache.org/jira/browse/SOLR-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157644#comment-15157644.


While I am able to get around it in this particular case, it does bring up
an interesting situation where there is a possibility of an infinite loop
while connecting to zookeeper if a DNS lookup fails and your ISP returns a
redirect instead of the expected DNS Failure.

- Dennis

On Wed, Feb 10, 2016 at 6:18 PM, david.w.smiley@gmail.com <
david.w.smiley@gmail.com> wrote:

> Both sound very good to me, Dennis. Thanks.
> On Wed, Feb 10, 2016 at 11:55 AM Dennis Gove <dp...@gmail.com> wrote:
>
>> Just wanted to take a moment to get anyone's thoughts on the following
>> issues
>>
>> https://issues.apache.org/jira/browse/SOLR-8599
>> https://issues.apache.org/jira/browse/SOLR-8666
>>
>> The originating problem occurred due to a DNS failure that caused some
>> nodes in a cloud setup to fail to connect to zookeeper. Those nodes were
>> running but were not participating in the cloud with the other nodes. The
>> disconnected nodes would respond to queries with stale data, though they
>> would reject injest requests.
>>
>> Ticket https://issues.apache.org/jira/browse/SOLR-8599 contains a patch
>> which ensures that if a connection to zookeeper fails to be made it will be
>> retried. Previously the failure wasn't leading to a retry so the node would
>> just run and be disconnect until the node itself was restarted.
>>
>> Ticket https://issues.apache.org/jira/browse/SOLR-8666 contains a patch
>> which will result in additional information returned to the client when a
>> node may be returning stale data due to not being connected to zookeeper.
>> The intent was to not change current behavior but allow the client to know
>> that something might be wrong. In situations where the collection is not
>> being updated the data may not be stale so it wouldn't matter if the node
>> is disconnected from zookeeper but in situations where the collection is
>> being updated then the data may be stale. The headers of the response will
>> now contain an entry to indicate this. Also, adds a header to the ping
>> response to also provide notification if the node is disconnected from
>> zookeeper.
>>
>> I think the approach these patches take are good but wanted to get
>> others' thoughts and perhaps I'm missing a case where these might cause a
>> problem.
>>
>> Thanks - Dennis
>>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>

Re: ZK Connection Failure leads to stale data

Posted by "david.w.smiley@gmail.com" <da...@gmail.com>.
Both sound very good to me, Dennis. Thanks.
On Wed, Feb 10, 2016 at 11:55 AM Dennis Gove <dp...@gmail.com> wrote:

> Just wanted to take a moment to get anyone's thoughts on the following
> issues
>
> https://issues.apache.org/jira/browse/SOLR-8599
> https://issues.apache.org/jira/browse/SOLR-8666
>
> The originating problem occurred due to a DNS failure that caused some
> nodes in a cloud setup to fail to connect to zookeeper. Those nodes were
> running but were not participating in the cloud with the other nodes. The
> disconnected nodes would respond to queries with stale data, though they
> would reject injest requests.
>
> Ticket https://issues.apache.org/jira/browse/SOLR-8599 contains a patch
> which ensures that if a connection to zookeeper fails to be made it will be
> retried. Previously the failure wasn't leading to a retry so the node would
> just run and be disconnect until the node itself was restarted.
>
> Ticket https://issues.apache.org/jira/browse/SOLR-8666 contains a patch
> which will result in additional information returned to the client when a
> node may be returning stale data due to not being connected to zookeeper.
> The intent was to not change current behavior but allow the client to know
> that something might be wrong. In situations where the collection is not
> being updated the data may not be stale so it wouldn't matter if the node
> is disconnected from zookeeper but in situations where the collection is
> being updated then the data may be stale. The headers of the response will
> now contain an entry to indicate this. Also, adds a header to the ping
> response to also provide notification if the node is disconnected from
> zookeeper.
>
> I think the approach these patches take are good but wanted to get others'
> thoughts and perhaps I'm missing a case where these might cause a problem.
>
> Thanks - Dennis
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com