You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Christine Poerschke (Jira)" <ji...@apache.org> on 2021/10/05 17:20:00 UTC

[jira] [Comment Edited] (SOLR-15676) PeerSync failure due to RealTimeGetComponent.getUpdates returning duplicate DBQs

    [ https://issues.apache.org/jira/browse/SOLR-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424592#comment-17424592 ] 

Christine Poerschke edited comment on SOLR-15676 at 10/5/21, 5:19 PM:
----------------------------------------------------------------------

We were investigating unexpected and mysterious PeerSync failures.

The failures were correlated with
{code:java}
Requested N updates from ... but retrieved N-x
{code}
INFO logging.

From log analysis and code reading [~rwhaddad] suspected that {{PeerSync.MissedUpdatesFinderBase.handleVersionsWithRanges}} was returning a too high {{totalRequestedVersions}} value due to duplicate versions in the inputs and via extra logging we were able to confirm that hypothesis.
 * [https://github.com/apache/solr/pull/327] proposes to extend the {{PeerSyncTest.handleVersionsWithRangesTests()}} added in SOLR-15667 to capture that behaviour.

[~rwhaddad] also noticed that the duplicate versions were DBQs i.e. DeleteByQuery versions.

The PeerSync logic could be revised to more gracefully handle duplicate versions and/or the RealTimeGetComponent logic could be revised to not return duplicate versions, but that does not explain how the duplicate versions can arise so relatively commonly.

It took some detective work but the issue seems to be with the {{UpdateLog.RecentUpdates.getDeleteByQuery()}} method.
 * [https://github.com/apache/solr/pull/328] proposes to add a {{UpdateLogCloudTest}} to capture the existing behaviour.

As can be seen DBQ updates can be returned twice, this then leads to duplicate versions in the tlog of the receiving instance and if subsequently the receiving instance supplies versions to another instance the DBQ version will be supplied twice.
 * [https://github.com/apache/solr/pull/329] (on top of [https://github.com/apache/solr/pull/328]) proposes to change the {{UpdateLog.RecentUpdates.getDeleteByQuery}} method so that it does not return duplicate versions.


was (Author: cpoerschke):
We were investigating unexpected and mysterious PeerSync failures.

The failures were correlated with
{code:java}
Requested N updates from ... but retrieved N-x
{code}
INFO logging.

From log analysis and code reading [~Haddad] suspected that {{PeerSync.MissedUpdatesFinderBase.handleVersionsWithRanges}} was returning a too high {{totalRequestedVersions}} value due to duplicate versions in the inputs and via extra logging we were able to confirm that hypothesis.
 * [https://github.com/apache/solr/pull/327] proposes to extend the {{PeerSyncTest.handleVersionsWithRangesTests()}} added in SOLR-15667 to capture that behaviour.

[~Haddad] also noticed that the duplicate versions were DBQs i.e. DeleteByQuery versions.

The PeerSync logic could be revised to more gracefully handle duplicate versions and/or the RealTimeGetComponent logic could be revised to not return duplicate versions, but that does not explain how the duplicate versions can arise so relatively commonly.

It took some detective work but the issue seems to be with the {{UpdateLog.RecentUpdates.getDeleteByQuery()}} method.
 * [https://github.com/apache/solr/pull/328] proposes to add a {{UpdateLogCloudTest}} to capture the existing behaviour.

As can be seen DBQ updates can be returned twice, this then leads to duplicate versions in the tlog of the receiving instance and if subsequently the receiving instance supplies versions to another instance the DBQ version will be supplied twice.
 * [https://github.com/apache/solr/pull/329] (on top of [https://github.com/apache/solr/pull/328]) proposes to change the {{UpdateLog.RecentUpdates.getDeleteByQuery}} method so that it does not return duplicate versions.

> PeerSync failure due to RealTimeGetComponent.getUpdates returning duplicate DBQs
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-15676
>                 URL: https://issues.apache.org/jira/browse/SOLR-15676
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (java)
>            Reporter: Christine Poerschke
>            Assignee: Christine Poerschke
>            Priority: Minor
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> please see comments for details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org