You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Nathan Neulinger (JIRA)" <ji...@apache.org> on 2013/10/31 00:54:25 UTC
[jira] [Commented] (SOLR-5407) Strange error condition with cloud
replication not working quite right
[ https://issues.apache.org/jira/browse/SOLR-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809767#comment-13809767 ]
Nathan Neulinger commented on SOLR-5407:
----------------------------------------
The only error we could find in the logs was this:
09:08:01 WARN PeerSync no frame of reference to tell if we've missed updates
09:25:49 WARN Overseer
09:25:49 ERROR SolrDispatchFilter null:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /aliases.json
09:25:49 ERROR SolrDispatchFilter null:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /aliases.json
09:25:49 WARN OverseerCollectionProcessor Overseer cannot talk to ZK
09:25:49 ERROR SolrDispatchFilter null:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /aliases.json
09:25:49 ERROR SolrDispatchFilter null:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /aliases.json
09:25:49 ERROR SolrDispatchFilter null:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /aliases.json
09:25:49 ERROR SolrDispatchFilter null:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /aliases.json
09:26:37 WARN PeerSync no frame of reference to tell if we've missed updates
> Strange error condition with cloud replication not working quite right
> ----------------------------------------------------------------------
>
> Key: SOLR-5407
> URL: https://issues.apache.org/jira/browse/SOLR-5407
> Project: Solr
> Issue Type: Bug
> Affects Versions: 4.5
> Reporter: Nathan Neulinger
> Labels: cloud, replication
>
> I have a clodu deployment of 4.5 on EC2. Architecture is 3 dedicated ZK nodes, and a pair of solr nodes. I'll apologize in advance that this error report is not going to have a lot of detail, I'm really hoping that the scenario/description will trigger some "likely" possible explanation.
> The situation I got into was that the server had decided to fail over, so my app servers were all taking to what should have been the primary for most of the shards/collections, but actually was the replica.
> Here's where it gets odd - no errors being returned to the client code for any of the searches or document updates - and the current primary server was definitely receiving all of the updates - even though they were being submitted to the inactive/replica node. (clients talking to solr-p1, which was not primary at the time, and writes were being passed through to solr-r1, which was primary at the time.)
> All sounds good so far right? Except - the replica server at the time, through which the writes were passing - never got any of those content updates. It had an old unmodified copy of the index.
> I restarted solr-p1 (was the replica at the time) - no change in behavior. Behavior did not change until I killed and restarted the current primary (solr-r1) to force it to fail over.
> At that point, everything was all happy again and working properly.
> Until this morning, when one of the developers provisioned a new collection, which happened to put it's primary on solr-r1. Again, clients all pointing at solr-p1. The developer reported that the documents were going into the index, but not visible on the replica server.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org