You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (Jira)" <ji...@apache.org> on 2021/12/20 17:54:00 UTC

[jira] [Created] (KUDU-3345) Enforce hard memory limit upon accepting a request into the server RPC queue

Alexey Serbin created KUDU-3345:
-----------------------------------

             Summary: Enforce hard memory limit upon accepting a request into the server RPC queue
                 Key: KUDU-3345
                 URL: https://issues.apache.org/jira/browse/KUDU-3345
             Project: Kudu
          Issue Type: Improvement
          Components: master, tserver
    Affects Versions: 1.15.0
            Reporter: Alexey Serbin


As of 1.15.0 version, {{kudu-tserver}} and {{kudu-master}} both don't take into account current memory usage when admitting requests into the RPC queue.  The only limit that is checked by {{ServicePool::QueueInboundCall()}} is the current size of the RPC service queue size, which is controlled by the {{\-\-rpc_service_queue_length}} flag.

Given that the size of an incoming request might go as high as {{\-\-rpc_max_message_size}} (50MiB by default) and {{-\-\rpc_service_queue_length}} might be set high to accommodate for a surge of incoming requests, Kudu servers might go beyond the hard memory limit controlled by the {{\-\-memory_limit_hard_bytes}} flag.  Also, the Raft prepare queue doesn't seem to expose a limit on the total size of requests accumulated in the queue.  If too much memory is consumed by a Kudu server, it might exit unexpectedly either because it is killed by OOM killer or the {{new}} operator throws {{std::bad_alloc}} and the C++ runtime terminates the process with {{SIGABRT}} since memory allocation failures are not handled in the Kudu code.

At least, we saw an evidence of such situation when disk IO was very slow and {{kudu-tserver}} has accumulated many requests in its prepare queue (probably, there was some particular workload pattern which first sent many small write requests first and then followed up with big ones).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)