You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "István Fajth (Jira)" <ji...@apache.org> on 2022/01/24 13:32:00 UTC

[jira] [Commented] (HDDS-5954) EC: Review the TODOs in GRPC Xceiver client and fix them.

    [ https://issues.apache.org/jira/browse/HDDS-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481085#comment-17481085 ] 

István Fajth commented on HDDS-5954:
------------------------------------

To summarize further experiments on this, I have created a markdown document about how the gRPC client works in Ozone.
The main points and consequences:
- all calls are made synchronously today, and there is traces to only make the WriteChunk and PutBlock asynchronous
- WriteChunk and PutBlock is not called asynchronously, as ordering between them is not solved otherwise, so we chose to wait for the result of these calls as well.
- EC will have external synchronization points, so it can afford to have WriteChunk and PutBlock to be called really asynchronously
- Standalone client is not used for write except in tests, and if a client asks for the standalone client directly. (Note that RandomKeyGenerator in freon uses the standalone client for writes by default.
- Caching the StreamObservers introduces a new problem, the assignment of results to completable futures in the client reply is not direct but indirectly depends on the ordering internal between the stream pair within StreamObservers, which is speculative, moreover, using the same StreamObserver synchronizes the request within one stream pair, so even though we get ordering we can not get async calls with caching the StreamObservers.
- Even though creating the StreamObserver pairs is costly compared to almost 0 cost processing, it does not have too much gain when the processing of the requests takes time as well.

Because of all this, and because of the fact that the standalone client is not deprecated, and can be used from CLI, we should not modify the XceiverClientGrpc class directly, but we can specialize it for EC.
Also an other task we should have done earlier, as RATIS replication is the default, we should have turn our tests to use RATIS replication, and we should deprecate the Standalone client to use for writes, and state that specifically.

In order to get to this, I am cancelling the PR for this JIRA, and moving it further to under the new tickets, and I am closing this ticket as well.
I have created the following JIRAs to track this effort further:
HDDS-6217 - Cleanup XceiverClientGrpc TODOs, and document how the client works and should be used.
HDDS-6218 - Deprecate the standalone client for writes
HDDS-6219 - Switch to RATIS ReplicationType from STAND_ALONE in our tests
HDDS-6220 - EC: Introduce a gRPC client implementation for EC with really async WriteChunk and PutBlock (on EC branch)

> EC: Review the TODOs in GRPC Xceiver client and fix them.
> ---------------------------------------------------------
>
>                 Key: HDDS-5954
>                 URL: https://issues.apache.org/jira/browse/HDDS-5954
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Uma Maheswara Rao G
>            Assignee: István Fajth
>            Priority: Major
>              Labels: pull-request-available
>
> Currently there are 4 TODO-s in the GRPC client.
> 1. L331 adds a note that we should cache the current leader, so that we can go to the leader next time.
> 2. L422 adds a note about sendCommandAsync, which states that it is not async. The code on the other hand seems to be returning a CompletableFuture instance wrapped inside an XceiverClientReply, though sometimes we wait on the future before really returning.
> 3. L452 notes that async requests are served out of order, and this should be revisited if we make the API async.
> 4. L483 is connected to #2, and it notes that we should reuse stream observers if we are going down the async route
> The latter three requires deeper investigation and understanding, to see how we can approach fixing it, and to figure out whether we really need to fix it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org