You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Adar Dembo (Jira)" <ji...@apache.org> on 2019/09/17 20:51:00 UTC

[jira] [Commented] (KUDU-2942) A rare flaky test for the aggregated live row count

    [ https://issues.apache.org/jira/browse/KUDU-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931804#comment-16931804 ] 

Adar Dembo commented on KUDU-2942:
----------------------------------

I don't see how the two lines you've described can lead to concurrent heartbeats from the same tserver. Remember that TriggerASAP() doesn't actually send a heartbeat; it just instructs the tserver's heartbeater thread to send a heartbeat as soon as it can (and maybe wakes up the thread if it was asleep). There's still just one heartbeating thread per tserver per master, and the heartbeat is synchronous (i.e. the heartbeating thread waits for the master to respond before continuing its execution).


> A rare flaky test for the aggregated live row count
> ---------------------------------------------------
>
>                 Key: KUDU-2942
>                 URL: https://issues.apache.org/jira/browse/KUDU-2942
>             Project: Kudu
>          Issue Type: Bug
>            Reporter: HeLifu
>            Priority: Major
>         Attachments: ts_tablet_manager-itest.txt
>
>
> A few days ago, Adar met a rare flaky test for the live row count in TSAN mode.
>  
> {code:java}
> // code placeholder
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/integration-tests/ts_tablet_manager-itest.cc:642
>       Expected: live_row_count
>       Which is: 327
> To be equal to: table_info->GetMetrics()->live_row_count->value()
>       Which is: 654
> {code}
> It seems the metric value is doubled. And his full test output is in the attachment.
>  
> I reviewed the previous patches and made some unusual guesses. I think one of them could explain the issue:
> When one master just becomes the leader and there are two heartbeat messages from the same tserver that are processed in parallel at [Line4239|https://github.com/apache/kudu/blob/1bdae88faefe9b0d43b6897d96cd853bc5dd7353/src/kudu/master/catalog_manager.cc#L4239], then the metric value will be doubled because the old tablet stats can be accessed concurrently.
> Thus, the question becomes how to generate two heartbeat messages from the same tserver at the same time? The possible answer is: [First heartbeat message|https://github.com/apache/kudu/blob/1bdae88faefe9b0d43b6897d96cd853bc5dd7353/src/kudu/integration-tests/ts_tablet_manager-itest.cc#L741] and [Second heartbeat message|https://github.com/apache/kudu/blob/1bdae88faefe9b0d43b6897d96cd853bc5dd7353/src/kudu/integration-tests/ts_tablet_manager-itest.cc#L635]
> Please don't forget the above case is integrate test environment, not product.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)