You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Matthias Pohl (Jira)" <ji...@apache.org> on 2023/03/06 09:33:00 UTC

[jira] [Commented] (FLINK-26624) Running HA (hashmap, async) end-to-end test failed on azure due to unable to find master logs

    [ https://issues.apache.org/jira/browse/FLINK-26624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696816#comment-17696816 ] 

Matthias Pohl commented on FLINK-26624:
---------------------------------------

Not sure whether the other build failures had the same issue but there is a test instability being caused by test code due to us not sorting the files before calling uniq on it in [flink-end-to-end-tests/test-scripts/common_ha.sh:52|https://github.com/apache/flink/blob/aee12bc412559a50a419907fff51df5f91fc6b52/flink-end-to-end-tests/test-scripts/common_ha.sh#L52].
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46800&view=logs&j=e9d3d34f-3d15-59f4-0e3e-35067d100dfe&t=f8a6d3eb-38cf-5cca-9a99-d0badeb5fe62&l=13075

The issue causes the equal in line 53 to become false due to 4 instead of 3 expected files being detected:
{code}
$ grep -r --include "*$standalonesession*.log*" -e "Completed checkpoint" . | cut -d ":" -f 1 | sed "s/\.[0-9]\{1,\}$//g" | uniq        
./flink-vsts-standalonesession-2-fv-az26-851.log
./flink-vsts-standalonesession-0-fv-az26-851.log
./flink-vsts-standalonesession-2-fv-az26-851.log
./flink-vsts-standalonesession-1-fv-az26-851.log
{code}

{{./flink-vsts-standalonesession-2-fv-az26-851.log}} is counted twice because there was a rolling log file {{{{./flink-vsts-standalonesession-2-fv-az26-851.log.1}} being created for that one

> Running HA (hashmap, async) end-to-end test failed on azure due to unable to find master logs
> ---------------------------------------------------------------------------------------------
>
>                 Key: FLINK-26624
>                 URL: https://issues.apache.org/jira/browse/FLINK-26624
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.0
>            Reporter: Yun Gao
>            Priority: Minor
>              Labels: auto-deprioritized-major, test-stability
>
> {code:java}
> Mar 12 04:31:15 Waiting for text Completed checkpoint [1-9]* for job 699ebf9bdcb51a9fe76db5463027d34c to appear 2 of times in logs...
> grep: /home/vsts/work/_temp/debug_files/flink-logs/*standalonesession-1*.log*: No such file or directory
> Mar 12 04:31:16 Starting standalonesession daemon on host fv-az302-918.
> grep: /home/vsts/work/_temp/debug_files/flink-logs/*standalonesession-1*.log*: No such file or directory
> Mar 12 04:41:23 A timeout occurred waiting for Completed checkpoint [1-9]* for job 699ebf9bdcb51a9fe76db5463027d34c to appear 2 of times in logs.
> Mar 12 04:41:23 Stopping job timeout watchdog (with pid=272045)
> Mar 12 04:41:23 Killing JM watchdog @ 273681
> Mar 12 04:41:23 Killing TM watchdog @ 274268
> Mar 12 04:41:23 [FAIL] Test script contains errors.
> Mar 12 04:41:23 Checking of logs skipped.
> Mar 12 04:41:23 
> Mar 12 04:41:23 [FAIL] 'Running HA (hashmap, async) end-to-end test' failed after 10 minutes and 31 seconds! Test exited with exit code 1
> Mar 12 04:41:23 
> 04:41:23 ##[group]Environment Information
> Mar 12 04:41:24 Searching for .dump, .dumpstream and related files in '/home/vsts/work/1/s'
> dmesg: read kernel buffer failed: Operation not permitted
> Mar 12 04:41:28 Stopping taskexecutor daemon (pid: 272837) on host fv-az302-918.
> Mar 12 04:41:29 Stopping standalonesession daemon (pid: 274590) on host fv-az302-918.
> Mar 12 04:41:35 Stopping zookeeper...
> Mar 12 04:41:36 Stopping zookeeper daemon (pid: 272248) on host fv-az302-918.
> The STDIO streams did not close within 10 seconds of the exit event from process '/usr/bin/bash'. This may indicate a child process inherited the STDIO streams and has not yet exited.
> ##[error]Bash exited with code '1'.
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=32945&view=logs&j=bea52777-eaf8-5663-8482-18fbc3630e81&t=b2642e3a-5b86-574d-4c8a-f7e2842bfb14



--
This message was sent by Atlassian Jira
(v8.20.10#820010)