You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2019/04/05 14:28:00 UTC

[jira] [Commented] (SOLR-13376) Multi-node race condition to create/remove nodeLost markers

    [ https://issues.apache.org/jira/browse/SOLR-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810887#comment-16810887 ] 

Andrzej Bialecki  commented on SOLR-13376:
------------------------------------------

Hmm, indeed there's a race condition here.

The reason for having more than 1 node attempt creating a nodeLost marker is that more than 1 node may go away (3 was a magic number ;) that we felt wasn't excessive and still reduced the chance of losing the event due to multiple node failures).

This cleaning of leftover markers in {{OverseerTriggerThread}} was added early on when we added this functionality, and it may not be necessary anymore - there's {{InactiveMarkersPlanAction}} that runs periodically to remove stale markers.

> Multi-node race condition to create/remove nodeLost markers
> -----------------------------------------------------------
>
>                 Key: SOLR-13376
>                 URL: https://issues.apache.org/jira/browse/SOLR-13376
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Andrzej Bialecki 
>            Priority: Major
>
> NodeMarkersRegistrationTest.testNodeMarkersRegistration is frequently failing on jenkins builds in the same spot, with a similar looking logs.
> Although i haven't been able to reproduce these failures locally, I am fairly confident that the problem is a race condition bug that exists between when/how a new Overseer will process & clean up "nodeLost" marker's in ZK, with how other nodes may (mistakenly) re-create those markers in their liveNodes listener.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org