You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Grant Henke (Jira)" <ji...@apache.org> on 2020/06/03 15:45:00 UTC

[jira] [Updated] (KUDU-2942) A rare flaky test for the aggregated live row count

     [ https://issues.apache.org/jira/browse/KUDU-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Henke updated KUDU-2942:
------------------------------
    Component/s: test

> A rare flaky test for the aggregated live row count
> ---------------------------------------------------
>
>                 Key: KUDU-2942
>                 URL: https://issues.apache.org/jira/browse/KUDU-2942
>             Project: Kudu
>          Issue Type: Bug
>          Components: test
>            Reporter: LiFu He
>            Priority: Major
>         Attachments: ts_tablet_manager-itest.txt
>
>
> A few days ago, Adar met a rare flaky test for the live row count in TSAN mode.
>  
> {code:java}
> // code placeholder
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/integration-tests/ts_tablet_manager-itest.cc:642
>       Expected: live_row_count
>       Which is: 327
> To be equal to: table_info->GetMetrics()->live_row_count->value()
>       Which is: 654
> {code}
> It seems the metric value is doubled. And his full test output is in the attachment.
>  
> I reviewed the previous patches and made some unusual guesses. I think one of them could explain the issue:
> When one master just becomes the leader and there are two heartbeat messages from the same tserver that are processed in parallel at [Line4239|https://github.com/apache/kudu/blob/1bdae88faefe9b0d43b6897d96cd853bc5dd7353/src/kudu/master/catalog_manager.cc#L4239], then the metric value will be doubled because the old tablet stats can be accessed concurrently.
> Thus, the question becomes how to generate two heartbeat messages from the same tserver at the same time? The possible answer is: [First heartbeat message|https://github.com/apache/kudu/blob/1bdae88faefe9b0d43b6897d96cd853bc5dd7353/src/kudu/integration-tests/ts_tablet_manager-itest.cc#L741] and [Second heartbeat message|https://github.com/apache/kudu/blob/1bdae88faefe9b0d43b6897d96cd853bc5dd7353/src/kudu/integration-tests/ts_tablet_manager-itest.cc#L635]
> Please don't forget the above case is integrate test environment, not product.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)