You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/12/10 19:41:00 UTC

[jira] [Commented] (KUDU-1586) If a single op is larger than consensus_max_batch_size_bytes, consensus gets stuck

    [ https://issues.apache.org/jira/browse/KUDU-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457339#comment-17457339 ] 

ASF subversion and git services commented on KUDU-1586:
-------------------------------------------------------

Commit d0243afe2ed93aeb18e318068df1bc02de72ad1a in kudu's branch refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=d0243af ]

[consensus] minor clean-up on LogCache

Since I was looking a bit into the code of the LogCache cache,
I went ahead and did a minor clean-up here, such as

  * removing unused code
  * fixing code style
  * simplifying the going-over-max_size_bytes condition in ReadOps(),
    making sure the regression test for KUDU-1586 passes
  * fixing signed/unsigned comparison warning for a Raft op's index and
    the index of the corresponding entry in the cache
  * other unsorted minor updates

Change-Id: I48f60c44209e269eb6b00278c6e32d4398ef9a55
Reviewed-on: http://gerrit.cloudera.org:8080/18081
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Tested-by: Alexey Serbin <as...@cloudera.com>


> If a single op is larger than consensus_max_batch_size_bytes, consensus gets stuck
> ----------------------------------------------------------------------------------
>
>                 Key: KUDU-1586
>                 URL: https://issues.apache.org/jira/browse/KUDU-1586
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>    Affects Versions: 0.10.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> I noticed on a cluster test that a leader was spinning with log messages like:
> I0829 14:17:31.870786 22184 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)
> I0829 14:17:31.873234  6186 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)
> I0829 14:17:31.875713 22184 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)
> I0829 14:17:31.878078  6186 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)
> After investigation, it seems this op was larger than 1MB (default consensus batch size) and this caused this tight loop behavior with no progress.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)