You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Semyon Danilov (Jira)" <ji...@apache.org> on 2022/04/28 16:03:00 UTC

[jira] [Updated] (IGNITE-14085) Implement message recovery protocol over handshake

     [ https://issues.apache.org/jira/browse/IGNITE-14085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Semyon Danilov updated IGNITE-14085:
------------------------------------
    Description: 
First of all, we should introduce Communication Recovery Descriptor, a data structure that holds information about a specific connection between two nodes. It should hold the following data:
* Connection id (because we may have multiple connections between two nodes)
* Count of sent messages
* Count of received messages
* Count of acknowledgments received for sent messages
* Count of acknowledgments sent for received messages
* Queue of sent but not acknowledged messages 

Every connection must have a bound recovery descriptor so in case of the connectivity failure we can resend not-acknowledged messages.

The process of handshake should be as follows:
# Server receives incoming connection and sends its identity information (launch id, consistent id)
# Client receives server information and sends its identity and recovery information (connection id, number of received messages)
# Server receives client's recovery information and sends its own recovery information
# Server sends all unacknowledged messages if any exists
# Client sends all unacknowledged messages if any exists

The process of sending and receiving a message should also change:
Every message we are going to send must first be added to the communication recovery descriptor's message queue and update the sent message counter. 
After receiving a message we should send an acknowledgement (we could also send a batch acknowledgement, for example for every 5 received messages send 1 ack) and update the received messages counter and the sent acknowledgements counter.
After receiving an acknowledgement message we must remove the sent message from the CRD's queue and update the appropriate counter.

Extra attention should be paid for the counter management as messages are not idempotent and handling same message twice can lead to an undefined behaviour.

Some of the message should not be counted at all (thus shall not be acknowledged), for example: acknowledgement messages, handshakes, probably something else.

It should also be noted that current messaging API has a public method for sending a message without a need for acknowledgement, this should be handled appropriately.

  was:The central idea of recovery protocol is the same as it is in the current implementation. So it needs to implement a similar idea with the recovery descriptor. This means information about last sending/received messages should be sent during the handshake and according to this information messages which were not received should be sent one more time.


> Implement message recovery protocol over handshake
> --------------------------------------------------
>
>                 Key: IGNITE-14085
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14085
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Anton Kalashnikov
>            Assignee: Semyon Danilov
>            Priority: Major
>              Labels: iep-66, ignite-3
>
> First of all, we should introduce Communication Recovery Descriptor, a data structure that holds information about a specific connection between two nodes. It should hold the following data:
> * Connection id (because we may have multiple connections between two nodes)
> * Count of sent messages
> * Count of received messages
> * Count of acknowledgments received for sent messages
> * Count of acknowledgments sent for received messages
> * Queue of sent but not acknowledged messages 
> Every connection must have a bound recovery descriptor so in case of the connectivity failure we can resend not-acknowledged messages.
> The process of handshake should be as follows:
> # Server receives incoming connection and sends its identity information (launch id, consistent id)
> # Client receives server information and sends its identity and recovery information (connection id, number of received messages)
> # Server receives client's recovery information and sends its own recovery information
> # Server sends all unacknowledged messages if any exists
> # Client sends all unacknowledged messages if any exists
> The process of sending and receiving a message should also change:
> Every message we are going to send must first be added to the communication recovery descriptor's message queue and update the sent message counter. 
> After receiving a message we should send an acknowledgement (we could also send a batch acknowledgement, for example for every 5 received messages send 1 ack) and update the received messages counter and the sent acknowledgements counter.
> After receiving an acknowledgement message we must remove the sent message from the CRD's queue and update the appropriate counter.
> Extra attention should be paid for the counter management as messages are not idempotent and handling same message twice can lead to an undefined behaviour.
> Some of the message should not be counted at all (thus shall not be acknowledged), for example: acknowledgement messages, handshakes, probably something else.
> It should also be noted that current messaging API has a public method for sending a message without a need for acknowledgement, this should be handled appropriately.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)