You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ignite.apache.org by "Alexey Goncharuk (JIRA)" <ji...@apache.org> on 2018/04/09 11:13:00 UTC

[jira] [Commented] (IGNITE-7871) Implement 2-phase waiting for partition release

    [ https://issues.apache.org/jira/browse/IGNITE-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430383#comment-16430383 ] 

Alexey Goncharuk commented on IGNITE-7871:
------------------------------------------

Pavel,

Please check if we really need the second latch during exchange - after local transactions await we should always send final values of update counters to nodes, so it seems to me that second latch is not required.

Also, please rename LatchManager to something like ExchangeLatchStorage and get rid of the extra topic (TOPIC_CACHE should be enough) - the latch implementation is heavily relies on the fact that it is created from the exchange future, so it wont confuse other developers.
Also, please consider moving message sending from onNodeLeft() callback - this callback is called from discovery thread, and message sending can be stuck on connection create - runLocalSafe should do the trick.

> Implement 2-phase waiting for partition release
> -----------------------------------------------
>
>                 Key: IGNITE-7871
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7871
>             Project: Ignite
>          Issue Type: Improvement
>          Components: cache
>    Affects Versions: 2.4
>            Reporter: Pavel Kovalenko
>            Assignee: Alexey Goncharuk
>            Priority: Major
>             Fix For: 2.5
>
>
> Using validation implemented in IGNITE-7467 we can observe the following situation:
> Let's we have some partition and nodes which owning it N1 (primary) and N2 (backup)
> 1) Exchange is started
> 2) N2 finished waiting for partitions release and started to create Single message (with update counters).
> 3) N1 waits for partitions release.
> 4) We have pending cache update N1 -> N2. This update is done after step 2.
> 5) This update increments update counters both on N1 and N2.
> 6) N1 finished waiting for partitions release, while N2 already sent Single message to coordinator with outdated update counter.
> 7) Coordinator sees different partition update counters for N1 and N2. Validation is failed, while data is equal.  
> Solution:
> Every server node participating in PME should wait while all other server nodes will finish their ongoing updates (finish wait for partition release method)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)