You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2018/01/04 18:21:24 UTC

SolrCmdDistributor retries.

Down in SolrCmdDistibuted.doRetriesIfNeeded there are a series of specific
codes that we retry on, here:

if (isRetry) {
  if (rspCode == 404 || rspCode == 403 || rspCode == 503) {
    doRetry = true;
  }

...

Absent is a 401. What I think I'm seeing
in the field is that there's a timeout with
this reported during a distributed update.


    Invalid key request timestamp: 1513273351295 ,
    received timestamp: 1513273356700 , TTL: 5000

This appears to only happen very occasionally.

The problem is that this leads to a the leader putting the

follower into LIR and all the problems that entails.


Certainly the timeout can be lengthened with:
-Dpkiauth.ttl=10000

or whatever, but my question is whether it makes

sense to retry in this case in SolrCmdDistributor.

NOTE: I only have logs from 6.3 for this, but see

no evidence this has been changed since then.

Comments?