You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Matthias Pohl (Jira)" <ji...@apache.org> on 2022/03/18 14:36:00 UTC

[jira] [Comment Edited] (FLINK-24169) Flaky local YARN tests relying on log files

    [ https://issues.apache.org/jira/browse/FLINK-24169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508818#comment-17508818 ] 

Matthias Pohl edited comment on FLINK-24169 at 3/18/22, 2:35 PM:
-----------------------------------------------------------------

Thanks [~zchikan] for picking up the task and sorry for not replying earlier. I missed your initial mention.

Talking about the solution of [PR #18687|https://github.com/apache/flink/pull/18687]. I'm not comfortable about changing the log4j properties of {{flink-dist}} to fix some issue in the tests because it also affects production without consulting the mailing list. Sorry for not being precise enough in the issue description. That's my bad.

But thinking about it once more makes me wonder whether we could change the way the log files are selected for evaluation in the YARN tests (see [YarnTestBase:617|https://github.com/apache/flink/blob/da5e7574437dabb82d3cfedf7cebdf7c5ab755b6/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java#L617]) and not only look for the actual log file but also log files with {{.[0-9]}} file extension but the same base name


was (Author: mapohl):
Thanks [~zchikan] for picking up the task and sorry for not replying earlier. I missed your initial mention.

Talking about the solution of [PR #18687|https://github.com/apache/flink/pull/18687]. I'm not comfortable about changing the log4j properties of {{flink-dist}} to fix some issue in the tests because it also affects production without consulting the mailing list. Sorry for not being precise enough in the issue description. That's my bad.

But thinking about it once more makes me wonder whether we could change the way the log files are selected for evaluation in the YARN tests (see [YarnTestBase:617|https://github.com/apache/flink/blob/da5e7574437dabb82d3cfedf7cebdf7c5ab755b6/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java#L617]) and not only look for the actual log file but also log files with `.[0-9]` file extension but the same base name

> Flaky local YARN tests relying on log files
> -------------------------------------------
>
>                 Key: FLINK-24169
>                 URL: https://issues.apache.org/jira/browse/FLINK-24169
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN
>            Reporter: Matthias Pohl
>            Assignee: Zsombor Chikán
>            Priority: Major
>              Labels: pull-request-available, stale-assigned, test-stability
>             Fix For: 1.15.0
>
>
> While working on [PR #16989|https://github.com/apache/flink/pull/16989] for FLINK-23611, we experienced some flakiness when running {{YARNSessionCapacitySchedulerITCase.testDetachedPerJobYarnCluster}} locally.
> [~dmvk] discovered a bug in log4j (see [LOG4J2-3155|https://issues.apache.org/jira/browse/LOG4J2-3155]). The bug affects the test because they check the log files for specific log messages. The log messages ends up in the wrong log file if the rolling update mechanism is trigger. This does not seem to be an issue on AzureCI due to the slower hardware used for the worker machines.
> A solution to overcome this issue would be to add a custom log4j configuration that disables the {{appender.main.policies.startup.type = OnStartupTriggeringPolicy}} which is present in {{flink-dist}}'s log4j configuration.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)