You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Webster Homer <we...@sial.com> on 2018/05/18 17:42:22 UTC
CDCR sensitive to network failures
Recently I encountered some problems with CDCR after we experienced network
problems, I thought I'd share.
I'm using Solr 7.2.0
We have 3 solr cloud instances where we update one cloud and use cdcr to
forward updates to the two solrclouds that are hosted in a cloud.
Usually this works pretty well.
Recently we have experienced some serious but intermittent network issues.
When that occurs we find that we get tons of cdcr warnings:
CdcrReplicator Failed to forward update request to target:
bioreliance-catalog-assay
with errors like ClassCastException, and/or NullpointerException etc...
Updates accumulate on the server and it has tons of errors in the
cdcr?action=errors
"2018-05-18T16:11:19.860Z","internal","2018-05-18T16:11:18.860Z","internal",
"2018-05-18T16:11:17.860Z","internal",
When I looked around on the source collection, I found tlog files like this:
-rw-r--r-- 1 apache apache 1376736 May 10 23:04
tlog.0000000000000000141.1600138985674375168
*-rw-r--r-- 1 apache apache 0 May 11 23:05
tlog.0000000000000000143.1600229645842644992*
*-rw-r--r-- 1 apache apache 65458 May 12 07:50
tlog.0000000000000000142.1600229582225539072*
-rw-r--r-- 1 apache apache 1355610 May 18 10:05
tlog.0000000000000000144.1600814785270644736
-rw-r--r-- 1 apache apache 1355610 May 18 10:16
tlog.0000000000000000145.1600815458585411584
-rw-r--r-- 1 apache apache 1355610 May 18 10:21
tlog.0000000000000000146.1600815785277652992
-rw-r--r-- 1 apache apache 1355610 May 18 10:29
tlog.0000000000000000147.1600816282070941696
Note the 0 length file, and the truncated file
tlog.0000000000000000142.1600229582225539072
The solution is to delete these files. Once these files are removed the
updates start flowing
These errors show up as warnings in the log, I would have expected them to
be errors. CDCR doesn't seem to be able to detect that the tlog is
corrupted.
Hope this helps someone else. If there are better solutions, I'd like to
know
--
This message and any attachment are confidential and may be
privileged or
otherwise protected from disclosure. If you are not the intended
recipient,
you must not copy this message or attachment or disclose the
contents to
any other person. If you have received this transmission in error,
please
notify the sender immediately and delete the message and any attachment
from your system. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do
not accept liability for any omissions or errors in this
message which may
arise as a result of E-Mail-transmission or for damages
resulting from any
unauthorized changes of the content of this message and
any attachment thereto.
Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not guarantee
that this message is free of viruses and does
not accept liability for any
damages caused by any virus transmitted
therewith.
Click http://www.emdgroup.com/disclaimer
<http://www.emdgroup.com/disclaimer> to access the
German, French, Spanish
and Portuguese versions of this disclaimer.