You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Gary Yao (Jira)" <ji...@apache.org> on 2019/11/18 21:22:00 UTC

[jira] [Updated] (FLINK-14843) Streaming bucketing end-to-end test can fail with Output hash mismatch

     [ https://issues.apache.org/jira/browse/FLINK-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Yao updated FLINK-14843:
-----------------------------
    Labels: test-stability  (was: )

> Streaming bucketing end-to-end test can fail with Output hash mismatch
> ----------------------------------------------------------------------
>
>                 Key: FLINK-14843
>                 URL: https://issues.apache.org/jira/browse/FLINK-14843
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem, Tests
>    Affects Versions: 1.10.0
>         Environment: rev: dcc1330375826b779e4902176bb2473704dabb11
>            Reporter: Gary Yao
>            Priority: Critical
>              Labels: test-stability
>
> *Description*
> Streaming bucketing end-to-end test ({{test_streaming_bucketing.sh}}) can fail with Output hash mismatch.
> {noformat}
> Number of running task managers has reached 4.
> Job (67212178694f8b2a9bc9d9572567a53f) is running.
> Waiting until all values have been produced
> Truncating buckets
> Number of produced values 26325/60000
> Truncating buckets
> Number of produced values 31315/60000
> Truncating buckets
> Number of produced values 36735/60000
> Truncating buckets
> Number of produced values 40705/60000
> Truncating buckets
> Number of produced values 46125/60000
> Truncating buckets
> Number of produced values 51135/60000
> Truncating buckets
> Number of produced values 56555/60000
> Truncating buckets
> Number of produced values 61935/60000
> Cancelling job 67212178694f8b2a9bc9d9572567a53f.
> Cancelled job 67212178694f8b2a9bc9d9572567a53f.
> Waiting for job (67212178694f8b2a9bc9d9572567a53f) to reach terminal state CANCELED ...
> Job (67212178694f8b2a9bc9d9572567a53f) reached terminal state CANCELED
> Job 67212178694f8b2a9bc9d9572567a53f was cancelled, time to verify
> FAIL Bucketing Sink: Output hash mismatch.  Got 4e2d1859e41184a38e5bc95090fe9941, expected 01aba5ff77a0ef5e5cf6a727c248bdc3.
> head hexdump of actual:
> 0000000   (   2   ,   1   0   ,   0   ,   S   o   m   e       p   a   y
> 0000010   l   o   a   d   .   .   .   )  \n   (   2   ,   1   0   ,   1
> 0000020   ,   S   o   m   e       p   a   y   l   o   a   d   .   .   .
> 0000030   )  \n   (   2   ,   1   0   ,   2   ,   S   o   m   e       p
> 0000040   a   y   l   o   a   d   .   .   .   )  \n   (   2   ,   1   0
> 0000050   ,   3   ,   S   o   m   e       p   a   y   l   o   a   d   .
> 0000060   .   .   )  \n   (   2   ,   1   0   ,   4   ,   S   o   m   e
> 0000070       p   a   y   l   o   a   d   .   .   .   )  \n   (   2   ,
> 0000080   1   0   ,   5   ,   S   o   m   e       p   a   y   l   o   a
> 0000090   d   .   .   .   )  \n   (   2   ,   1   0   ,   6   ,   S   o
> 00000a0   m   e       p   a   y   l   o   a   d   .   .   .   )  \n   (
> 00000b0   2   ,   1   0   ,   7   ,   S   o   m   e       p   a   y   l
> 00000c0   o   a   d   .   .   .   )  \n   (   2   ,   1   0   ,   8   ,
> 00000d0   S   o   m   e       p   a   y   l   o   a   d   .   .   .   )
> 00000e0  \n   (   2   ,   1   0   ,   9   ,   S   o   m   e       p   a
> 00000f0   y   l   o   a   d   .   .   .   )  \n                        
> 00000fa
> Stopping taskexecutor daemon (pid: 654547) on host gyao-desktop.
> Stopping standalonesession daemon (pid: 650368) on host gyao-desktop.
> Stopping taskexecutor daemon (pid: 650812) on host gyao-desktop.
> Skipping taskexecutor daemon (pid: 651347), because it is not running anymore on gyao-desktop.
> Skipping taskexecutor daemon (pid: 651795), because it is not running anymore on gyao-desktop.
> Skipping taskexecutor daemon (pid: 652249), because it is not running anymore on gyao-desktop.
> Stopping taskexecutor daemon (pid: 653481) on host gyao-desktop.
> Stopping taskexecutor daemon (pid: 654099) on host gyao-desktop.
> [FAIL] Test script contains errors.
> Checking of logs skipped.
> [FAIL] 'flink-end-to-end-tests/test-scripts/test_streaming_bucketing.sh' failed after 2 minutes and 3 seconds! Test exited with exit code 1
> {noformat}
> *How to reproduce*
> Comment out the delay of 10s after the 1st TM is restarted to provoke the issue:
> {code:bash}
> echo "Restarting 1 TM"
> $FLINK_DIR/bin/taskmanager.sh start
> wait_for_number_of_running_tms 4
> #sleep 10
> echo "Killing 2 TMs"
> kill_random_taskmanager
> kill_random_taskmanager
> wait_for_number_of_running_tms 2
> {code}
> Command to run the test:
> {noformat}
> FLINK_DIR=build-target/ flink-end-to-end-tests/run-single-test.sh skip flink-end-to-end-tests/test-scripts/test_streaming_bucketing.sh
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)