You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (Jira)" <ji...@apache.org> on 2019/09/20 17:13:00 UTC

[jira] [Commented] (KUDU-2948) ksck claims output is different but it’s actually the same

    [ https://issues.apache.org/jira/browse/KUDU-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934587#comment-16934587 ] 

Alexey Serbin commented on KUDU-2948:
-------------------------------------

I guess the mismatch was due to Raft config term mismatch: the actual term of a new Raft configuration for the reported tablet had been bumped (i.e. at least one election happened after master received tablet report last time), but master didn't get the updated config, and the same leader was elected again.

To avoid confusion, it would be nice to enhance the message on the configuration mismatch with details on what the actual difference is: leader, set of voters, term mismatch, etc.

> ksck claims output is different but it’s actually the same
> ----------------------------------------------------------
>
>                 Key: KUDU-2948
>                 URL: https://issues.apache.org/jira/browse/KUDU-2948
>             Project: Kudu
>          Issue Type: Bug
>          Components: ksck
>    Affects Versions: 1.7.0
>         Environment: RHEL 7.7
>            Reporter: Abhishek
>            Priority: Major
>
> I came across this scenario where ksck reports the tablet has mismatched consensus and configs disagree with the master's but they do not seem to be:
> The consensus matrix is:
>  Config source | Replicas | Current term | Config index | Committed?
> ---------------+--------------+--------------+--------------+------------
>  master | A B* C | | | Yes
>  A | A B* C | 2939 | 20571 | Yes
>  B | A B* C | 2939 | 20571 | Yes
>  C | A B* C | 2939 | 20571 | Yes
> Tablet 8137349615944d45a0897090d36d7a08 of table 'impala::<TableName>' is conflicted: Tablet 8137349615944d45a0897090d36d7a08 of table 'impala::<TableName>' replicas' active configs disagree with the master's
>  6684505cec6f4a49b3442786cebdf06d (<serverFQDN>:7050): RUNNING
>  8a3232953edd4ba79d20711d4ea3581d (<serverFQDN>:7050): RUNNING [LEADER]
>  c8b68cb1366c45199668247d4d7c0295 (<serverFQDN>:7050): RUNNING
> All the peers reported by the master and tablet servers are:
>  A = 6684505cec6f4a49b3442786cebdf06d
>  B = 8a3232953edd4ba79d20711d4ea3581d
>  C = c8b68cb1366c45199668247d4d7c0295
>  
> The consensus matrix is:
>  Config source | Replicas | Current term | Config index | Committed?
> ---------------+--------------+--------------+--------------+------------
>  master | A B* C | | | Yes
>  A | A B* C | 5030 | 6114195 | Yes
>  B | A B* C | 5030 | 6114195 | Yes
>  C | A B* C | 5030 | 6114195 | Yes
> Table impala::<TableName> has 1 tablet(s) with mismatched consensus
> 1b6a44eadcd145f693390587c4e3308a (<serverFQDN>:7050): RUNNING
>  36f1c8c3863a49778adeb6f40c73aa26 (<serverFQDN>:7050): RUNNING [LEADER]
>  6c8fc8452f374672abdae3098a198101 (<serverFQDN>:7050): RUNNING
> 0 replicas' active configs differ from the master's.
>  All the peers reported by the master and tablet servers are:
>  A = 1b6a44eadcd145f693390587c4e3308a
>  B = 36f1c8c3863a49778adeb6f40c73aa26
>  C = 6c8fc8452f374672abdae3098a198101
>  
> Tablet servers are under high load - noticed many backpressure messages and the drives for data_dirs and wal_dis are encrypted using navencrypt.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)