You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (JIRA)" <ji...@apache.org> on 2017/10/25 23:44:01 UTC

[jira] [Created] (KUDU-2201) Write RPC reports negative timeout

Alexey Serbin created KUDU-2201:
-----------------------------------

             Summary: Write RPC reports negative timeout
                 Key: KUDU-2201
                 URL: https://issues.apache.org/jira/browse/KUDU-2201
             Project: Kudu
          Issue Type: Bug
          Components: client, rpc
    Affects Versions: 1.6.0
         Environment: A machine with relatively small amount of RAM: 8GB

Both tablet server and the kudu CLI tool were running at the same machine.
            Reporter: Alexey Serbin


[~dan@danburkert.com] ran the following twice on a machine with 8GB of RAM, while running the tablet server there as well:

{noformat}
$ kudu perf loadgen kudu-training-1-1.gce.cloudera.com --table-name=clicks --use-random --num-rows-per-thread=10000000
{noformat}

The subsequent run uses the same random seed, so all 20M rows were duplicates, which caused 20M insertion errors.  The OOM killer killed the tablet server, but the client also experienced some errors, which lead to bad-alloc exception and reporting of _negative_ timeout:

{noformat}
W1025 16:03:44.394037 14071 batcher.cc:349] Timed out: Failed to write batch of 506 ops to tablet 915e587ef6df47e8aae8a6c36cc37d56 after 1 attempt(s): Failed to write to server: bf76454db52c4b9f82e29a2783621439 (kudu-training-1-4.gce.cloudera.com:7050): Write RPC to 172.31.120.27:7050 timed out after -15.216s (ON_OUTBOUND_QUEUE)
{noformat}

The tail of the error output from 'kudu perf loadgen': 
{noformat}
W1025 16:03:44.393877 14071 connection.cc:221] RPC call timeout handler was delayed by 18.8709s! This may be due to a process-wide pause such as swapping, logging-related delays, or allocator lock contention. Will allow an additional 0.999165s for a response.
W1025 16:03:44.393882 14071 connection.cc:221] RPC call timeout handler was delayed by 18.8466s! This may be due to a process-wide pause such as swapping, logging-related delays, or allocator lock contention. Will allow an additional 0.999261s for a response.
W1025 16:03:44.393888 14071 connection.cc:221] RPC call timeout handler was delayed by 18.8442s! This may be due to a process-wide pause such as swapping, logging-related delays, or allocator lock contention. Will allow an additional 0.99953s for a response.
W1025 16:03:44.393893 14071 connection.cc:221] RPC call timeout handler was delayed by 18.8419s! This may be due to a process-wide pause such as swapping, logging-related delays, or allocator lock contention. Will allow an additional 0.999786s for a response.
W1025 16:03:44.393898 14071 connection.cc:221] RPC call timeout handler was delayed by 18.8399s! This may be due to a process-wide pause such as swapping, logging-related delays, or allocator lock contention. Will allow an additional 0.999999s for a response.
W1025 16:03:44.394018 14071 meta_cache.cc:217] Tablet 915e587ef6df47e8aae8a6c36cc37d56: Replica bf76454db52c4b9f82e29a2783621439 (kudu-training-1-4.gce.cloudera.com:7050) has failed: Timed out: Write RPC to 172.31.120.27:7050 timed out after -15.216s (ON_OUTBOUND_QUEUE)
W1025 16:03:44.394037 14071 batcher.cc:349] Timed out: Failed to write batch of 506 ops to tablet 915e587ef6df47e8aae8a6c36cc37d56 after 1 attempt(s): Failed to write to server: bf76454db52c4b9f82e29a2783621439 (kudu-training-1-4.gce.cloudera.com:7050): Write RPC to 172.31.120.27:7050 timed out after -15.216s (ON_OUTBOUND_QUEUE)
W1025 16:04:34.604554 14071 outbound_call.cc:286] RPC callback for RPC call kudu.tserver.TabletServerService.Write -> {remote=172.31.120.27:7050 (kudu-training-1-4.gce.cloudera.com), user_credentials={real_user=root}} blocked reactor thread for 65614.7us
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)