You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Roman Puchkovskiy (Jira)" <ji...@apache.org> on 2023/05/22 10:11:00 UTC
[jira] [Updated] (IGNITE-19227) Wait for schema awailability out of JRaft threads

     [ https://issues.apache.org/jira/browse/IGNITE-19227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Roman Puchkovskiy updated IGNITE-19227:
---------------------------------------
    Description: 
According to [https://cwiki.apache.org/confluence/display/IGNITE/IEP-98%3A+Schema+Synchronization#IEP98:SchemaSynchronization-Waitingforsafetimeinthepast] , we might need to wait for schema availability when fetching a schema. If such waits happen inside a PartitionListener, JRaft threads might be blocked for a noticeable amount of time (maybe even seconds). We should avoid this.
h3. In RW transactions

When a primary node is going to process a request, it waits till it has all the schema versions for the corresponding timestamp (beginTs or commitTs) Top (i.e. that MS SafeTime >= Top). {*}The wait happens outside of JRaft threads{*}. Then it obtains the global schema revision SR of the latest schema update that is not later than the corresponding timestamp. It then builds a command (putting that SR inside) and submits it to RAFT.

When an AppendEntriesRequest is built, Replicator inspects all the entries it includes in it, extracts SRs from each of them, takes max of them (as MSR, for ‘max schema revision’) and puts it in the AppendEntriesRequest.

When the request is processed by a follower/learner, it compares the MSR from the request with its locally known MSR (in the Catalog). If the request’s MSR > local MSR, then the request is rejected (with reason EBUSY). It will be retried by the leader after some time. As an optimization, we might wait for some time in hope that the local MSR catches up with the request’s MSR.

As we need an additional field in AppendEntriesRequest that will only be used by partition groups, we could add a generic container for properties to this interface, like Map<String, Object> extras().

To extract the SR from a command, we might just deserialize it completely, but this requires a lot of work that is not necessary. We might serialize commands having SR in a special way (putting SR in the very first bytes of the message) to make its retrieval effective.

As the primary has already made sure that it has the schema versions needed to execute the command, no waits will be needed on the primary node while executing the RAFT command.

As secondaries/learners refuse AppendEntries which they cannot execute waitless, they will not have to wait at all in JRaft threads.

A case when the RAFT leader is not collocated with the primary is possible. We can add the same validation for ActionRequests: pass the required SR inside an ActionRequest, validate it in ActionRequestProcessor and reject requests having SR above the local MSR.
h3. In RO transactions

When processing an RO transaction, we just wait for MS SafeTime. This is made out of RAFT, so no special measures are needed.

  was:
According to [https://cwiki.apache.org/confluence/display/IGNITE/IEP-98%3A+Schema+Synchronization#IEP98:SchemaSynchronization-Waitingforsafetimeinthepast] , we might need to wait for schema availability when fetching a schema. If such waits happen inside a PartitionListener, JRaft threads might be blocked for a noticeable amount of time (maybe even seconds). We should avoid this.

For RW transactions, we can fetch the schema needed by the operation on the primary replica before submitting a RAFT command to RAFT, so that the possible wait happen in a user's thread.

For RO transactions, this is not a problem because we don't use RAFT for RO transactions.


> Wait for schema awailability out of JRaft threads
> -------------------------------------------------
>
>                 Key: IGNITE-19227
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19227
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>
> According to [https://cwiki.apache.org/confluence/display/IGNITE/IEP-98%3A+Schema+Synchronization#IEP98:SchemaSynchronization-Waitingforsafetimeinthepast] , we might need to wait for schema availability when fetching a schema. If such waits happen inside a PartitionListener, JRaft threads might be blocked for a noticeable amount of time (maybe even seconds). We should avoid this.
> h3. In RW transactions
> When a primary node is going to process a request, it waits till it has all the schema versions for the corresponding timestamp (beginTs or commitTs) Top (i.e. that MS SafeTime >= Top). {*}The wait happens outside of JRaft threads{*}. Then it obtains the global schema revision SR of the latest schema update that is not later than the corresponding timestamp. It then builds a command (putting that SR inside) and submits it to RAFT.
> When an AppendEntriesRequest is built, Replicator inspects all the entries it includes in it, extracts SRs from each of them, takes max of them (as MSR, for ‘max schema revision’) and puts it in the AppendEntriesRequest.
> When the request is processed by a follower/learner, it compares the MSR from the request with its locally known MSR (in the Catalog). If the request’s MSR > local MSR, then the request is rejected (with reason EBUSY). It will be retried by the leader after some time. As an optimization, we might wait for some time in hope that the local MSR catches up with the request’s MSR.
> As we need an additional field in AppendEntriesRequest that will only be used by partition groups, we could add a generic container for properties to this interface, like Map<String, Object> extras().
> To extract the SR from a command, we might just deserialize it completely, but this requires a lot of work that is not necessary. We might serialize commands having SR in a special way (putting SR in the very first bytes of the message) to make its retrieval effective.
> As the primary has already made sure that it has the schema versions needed to execute the command, no waits will be needed on the primary node while executing the RAFT command.
> As secondaries/learners refuse AppendEntries which they cannot execute waitless, they will not have to wait at all in JRaft threads.
> A case when the RAFT leader is not collocated with the primary is possible. We can add the same validation for ActionRequests: pass the required SR inside an ActionRequest, validate it in ActionRequestProcessor and reject requests having SR above the local MSR.
> h3. In RO transactions
> When processing an RO transaction, we just wait for MS SafeTime. This is made out of RAFT, so no special measures are needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)