You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Will Berkeley (JIRA)" <ji...@apache.org> on 2018/12/17 21:01:00 UTC

[jira] [Created] (KUDU-2642) Truncate follower's "last status" message logged by leader and shorten some replica status messages

Will Berkeley created KUDU-2642:
-----------------------------------

             Summary: Truncate follower's "last status" message logged by leader and shorten some replica status messages
                 Key: KUDU-2642
                 URL: https://issues.apache.org/jira/browse/KUDU-2642
             Project: Kudu
          Issue Type: Improvement
            Reporter: Will Berkeley
            Assignee: Will Berkeley


When a follower fails, the leader keeps trying to heartbeat to it while it remains in the config. When the heartbeat fails ever 0.5s, the leader logs the failure status of the replica returned by the remote's tablet manager.

This status can be VERY long:

{noformat}
FAILED Data state: TABLET_DATA_READY Last status: Invalid argument: Failed log replay. Reason: Debug Info: Error playing
entry 1389 of segment 27967 of tablet 578f2c6e60d84cb18d704889ea323cda. Segment path: /data/01/kudu/wal/wals/578f2c6e60d84cb18d704889ea323cda.recovery/wal-000027967. Entry: type: COMMIT
commit { op_type: WRITE_OP commited_op_id { term: 2534 index: 160307647 } result { ops { mutated_stores { rs_id: 118489 dms_id: 47 } } } }: Failed to play WRITE_OP request.
ReplicateMsg: { id { term: 2534 index: 160307647 } timestamp: 6317269870025498624 op_type: WRITE_OP write_request
(entire write op request here, possibly thousands of individual ops)
{noformat}

We should truncate status messages when logged on the leader at some reasonable size, like 4096 characters, to keep them from ballooning the logs with redundant information.

Maybe we should also consider collapsing these messages into a LOG_EVERY_N_SECS.

The whole message is only logged once on the follower (ironically). We should probably shorten this specific one too, because it's hard to see how logging it helps. If the log message contains the entry number, segment number, term, index, etc., that's already enough information to find the whole write op in the log if needed.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)