You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/12/10 19:41:00 UTC
[jira] [Commented] (KUDU-1586) If a single op is larger than consensus_max_batch_size_bytes, consensus gets stuck
[ https://issues.apache.org/jira/browse/KUDU-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457339#comment-17457339 ]
ASF subversion and git services commented on KUDU-1586:
-------------------------------------------------------
Commit d0243afe2ed93aeb18e318068df1bc02de72ad1a in kudu's branch refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=d0243af ]
[consensus] minor clean-up on LogCache
Since I was looking a bit into the code of the LogCache cache,
I went ahead and did a minor clean-up here, such as
* removing unused code
* fixing code style
* simplifying the going-over-max_size_bytes condition in ReadOps(),
making sure the regression test for KUDU-1586 passes
* fixing signed/unsigned comparison warning for a Raft op's index and
the index of the corresponding entry in the cache
* other unsorted minor updates
Change-Id: I48f60c44209e269eb6b00278c6e32d4398ef9a55
Reviewed-on: http://gerrit.cloudera.org:8080/18081
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Tested-by: Alexey Serbin <as...@cloudera.com>
> If a single op is larger than consensus_max_batch_size_bytes, consensus gets stuck
> ----------------------------------------------------------------------------------
>
> Key: KUDU-1586
> URL: https://issues.apache.org/jira/browse/KUDU-1586
> Project: Kudu
> Issue Type: Bug
> Components: consensus
> Affects Versions: 0.10.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Blocker
> Fix For: 1.0.0
>
>
> I noticed on a cluster test that a leader was spinning with log messages like:
> I0829 14:17:31.870786 22184 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)
> I0829 14:17:31.873234 6186 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)
> I0829 14:17:31.875713 22184 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)
> I0829 14:17:31.878078 6186 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)
> After investigation, it seems this op was larger than 1MB (default consensus batch size) and this caused this tight loop behavior with no progress.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)