You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org> on 2016/08/02 04:23:20 UTC

[jira] [Commented] (SOLR-9366) replicas in LIR seem to still be participating in search requests

    [ https://issues.apache.org/jira/browse/SOLR-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403321#comment-15403321 ] 

Shalin Shekhar Mangar commented on SOLR-9366:
---------------------------------------------

Yes, this is to be expected. The reason is that the queries execute before the new state (i.e. 'down') is propagated to all the nodes.

You can see this in the logs (the following excerpt is for shard5 only):
{code}
   [junit4]   2> 96432 INFO  (updateExecutor-29-thread-10-processing-x:test_col_shard5_replica2 r:core_node1 http:////127.0.0.1:35941//solr//test_col_shard5_replica1// n:127.0.0.1:36018_solr s:shard5 c:test_col) [n:127.0.0.1:36018_solr c:test_col s:shard5 r:core_node1 x:test_col_shard5_replica2] o.a.s.c.LeaderInitiatedRecoveryThread Asking core=test_col_shard5_replica1 coreNodeName=core_node6 on http://127.0.0.1:35941/solr to recover
....
   [junit4]   2> 96458 INFO  (qtp1947867562-51) [n:127.0.0.1:35941_solr    ] o.a.s.h.a.CoreAdminOperation It has been requested that we recover: core=test_col_shard5_replica1
....
   [junit4]   2> 96510 INFO  (OverseerStateUpdate-96343095816486940-127.0.0.1:36018_solr-n_0000000000) [n:127.0.0.1:36018_solr    ] o.a.s.c.o.ReplicaMutator Update state numShards=5 message={
   [junit4]   2>   "core":"test_col_shard5_replica1",
   [junit4]   2>   "core_node_name":"core_node6",
   [junit4]   2>   "roles":null,
   [junit4]   2>   "base_url":"http://127.0.0.1:35941/solr",
   [junit4]   2>   "node_name":"127.0.0.1:35941_solr",
   [junit4]   2>   "numShards":"5",
   [junit4]   2>   "state":"recovering",
   [junit4]   2>   "shard":"shard5",
   [junit4]   2>   "collection":"test_col",
   [junit4]   2>   "operation":"state"}
....
   [junit4]   2> 96527 INFO  (qtp623658064-525) [n:127.0.0.1:36018_solr    ] o.a.s.h.a.CoreAdminOperation Will wait a max of 183 seconds to see test_col_shard5_replica2 (shard5 of test_col) have state: recovering
....
   [junit4]   2> 96713 INFO  (OverseerStateUpdate-96343095816486940-127.0.0.1:36018_solr-n_0000000000) [n:127.0.0.1:36018_solr    ] o.a.s.c.o.ZkStateWriter going to update_collection /collections/test_col/state.json version: 14
....
[junit4]   2> 96715 INFO  (zkCallback-45-thread-2-processing-n:127.0.0.1:50832_solr) [n:127.0.0.1:50832_solr    ] o.a.s.c.c.ZkStateReader Updating data for [test_col] from [14] to [15]
[junit4]   2> 96715 INFO  (zkCallback-50-thread-5-processing-n:127.0.0.1:36018_solr) [n:127.0.0.1:36018_solr    ] o.a.s.c.c.ZkStateReader Updating data for [test_col] from [14] to [15]
{code}

# Time: 96432 -- The shard5 leader asks replica shard5_replica1 to recover.
# Time: 96458 -- The replica shard5_replica1 receives the recovery request. At this point, it internally marks itself in recovering state.
# Time: 96510 -- The message to publish shard5_replica1 as 'down' is received at overseer
# Time: 96527 -- The shard5 leader i.e. shard5_replica2 is still waiting to see shard5_replica1 as down. Note that around this time, the query is being executed
# Time: 96713 -- The overseer actually writes the new collection state to ZooKeeper. The delay is because the overseer buffers writes to ZK. The query has already executed by 96534.
# Time: 96715 -- The new collection state is finally available at the relevant nodes.

In summary, state updates cannot be instantaneously propagated to the entire cluster and this information asymmetry will always exist unless we implement consensus for each update. Also note that this is a *temporary* issue because the replica in recovery will not be selected for shard requests once the state is propagated across the cluster.

It is a bad idea to use Solr as a consistent data store like this test does. The test can do one of the following:
# Wait for recovery to complete before asserting doc counts,
# Use min_rf with updates and assert that all documents which made to both replicas are returned in the query results
# Query each shard leader individually (distrib=false) and sum up the numFound on the client

That being said, we can improve a few things:
# If the query happens to be aggregated by the leader, it can take LIR into account while selecting a replica for the non-distrib request
# The overseer can help by flushing 'down' state updates to ZK faster and not perform buffering.
# Provide a way to query leaders only for full consistency
# Use LIR information from ZK in addition to live_nodes to select replica for non-distrib request -- I am not sure we should do this. If we do, we should make sure we do not hit ZK in the fast path.

None of the above will completely avoid this problem but they will make it less probable.

> replicas in LIR seem to still be participating in search requests
> -----------------------------------------------------------------
>
>                 Key: SOLR-9366
>                 URL: https://issues.apache.org/jira/browse/SOLR-9366
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>         Attachments: SOLR-9366.txt.gz
>
>
> Spinning this off from SOLR-9363 where sarowe's jenkins encountered a strange test failure when TestInjection is used causing replicas to return errors on some requests.
> Reading over the logs it appears that when this happens, and the replicas are put into LIR they then ignore an explicit user requested commit (ie: {{Ignoring commit while not ACTIVE - state: BUFFERING replay: false}}) but still participate in queries -- apparently before they finish recovery (or at the very least before / with-out doing the commit/openSearcher that they previously ignored.
> Details and logs to follow...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org