You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/02/09 19:39:21 UTC

[GitHub] [flink] lonerzzz opened a new pull request #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

lonerzzz opened a new pull request #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042
 
 
   ## What is the purpose of the change
   
   The fundamental issue is that exceptions raised in the Task class are logged as log level info. This then requires retaining the log level at info level to be able to see these potentially important errors. However because the info level is intended for observation in some detail, the exceptions can be buried amongst the rest of the information.
   
   The proposed change is to set the output from exceptions at warning level so that info level logging need not be set in order to observe any exceptions occurring in the task.
   
   ## Brief change log
   
   Changed the log level to warn when exceptions occur in the Task
   
   ## Verifying this change
   
   This change is a trivial rework without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): (yes / no / don't know)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] aljoscha commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

Posted by GitBox <gi...@apache.org>.
aljoscha commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042#issuecomment-586941564
 
 
   Yes, I would be in favour if changing this to a warning. @zentol @StephanEwen ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

Posted by GitBox <gi...@apache.org>.
zentol commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042#issuecomment-585820786
 
 
   Note that this was discussed in the past; #5399 / FLINK-6206

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042#issuecomment-583887563
 
 
   <!--
   Meta data
   Hash:67b914a38db41bb391171bf55db63b84a3a7ef31 Status:FAILURE URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4981 TriggerType:PUSH TriggerID:67b914a38db41bb391171bf55db63b84a3a7ef31
   Hash:67b914a38db41bb391171bf55db63b84a3a7ef31 Status:PENDING URL:https://travis-ci.com/flink-ci/flink/builds/148107889 TriggerType:PUSH TriggerID:67b914a38db41bb391171bf55db63b84a3a7ef31
   -->
   ## CI report:
   
   * 67b914a38db41bb391171bf55db63b84a3a7ef31 Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/148107889) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4981) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] aljoscha commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

Posted by GitBox <gi...@apache.org>.
aljoscha commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042#issuecomment-585841472
 
 
   Maybe that's a sign that we should finally fix https://issues.apache.org/jira/browse/FLINK-6206.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042#discussion_r380154394
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java
 ##########
 @@ -952,7 +952,7 @@ private boolean transitionState(ExecutionState currentState, ExecutionState newS
 			if (cause == null) {
 				LOG.info("{} ({}) switched from {} to {}.", taskNameWithSubtask, executionId, currentState, newState);
 			} else {
-				LOG.info("{} ({}) switched from {} to {}.", taskNameWithSubtask, executionId, currentState, newState, cause);
+				LOG.warn("{} ({}) switched from {} to {}.", taskNameWithSubtask, executionId, currentState, newState, cause);
 
 Review comment:
   the last argument is a throwable, so I don't see a problem.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] aljoscha closed pull request #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

Posted by GitBox <gi...@apache.org>.
aljoscha closed pull request #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] aljoscha commented on a change in pull request #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

Posted by GitBox <gi...@apache.org>.
aljoscha commented on a change in pull request #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042#discussion_r378965519
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java
 ##########
 @@ -952,7 +952,7 @@ private boolean transitionState(ExecutionState currentState, ExecutionState newS
 			if (cause == null) {
 				LOG.info("{} ({}) switched from {} to {}.", taskNameWithSubtask, executionId, currentState, newState);
 			} else {
-				LOG.info("{} ({}) switched from {} to {}.", taskNameWithSubtask, executionId, currentState, newState, cause);
+				LOG.warn("{} ({}) switched from {} to {}.", taskNameWithSubtask, executionId, currentState, newState, cause);
 
 Review comment:
   Nevermind, you didn't change that.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] lonerzzz commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

Posted by GitBox <gi...@apache.org>.
lonerzzz commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042#issuecomment-586757018
 
 
   @zentol @aljoscha Upon reading the issue #5399, it didn't seem that any firm position was taken on the issue. The reference to setting JobManager output to log at the info level assumes an ability to recover. This is not true in all cases. Two situations that I have encountered are those from which recovery does not occur or occurs slowly:
   
   1) Job submission failure - there are many errors from which the submission will not recover without manual intervention. By forcing JobManager output to log at the info level, the JobManager must always be run with info level logging for situations where jobs are regularly submitted or the errors will not be visible.
   2) Rebalancing errors - several situations that I have encountered where the number of task slots is close to the number of tasks can result in jobs that are stuck awaiting deployment and rebalancing for very long periods of time in the event of a transient infrastructure error. While recovery may happen, it can take a while and a warning would at least allow operations staff to take manual action to correct things rather than finding out that a job in a pipeline is not processing because it is awaiting resources.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042#issuecomment-583887563
 
 
   <!--
   Meta data
   Hash:67b914a38db41bb391171bf55db63b84a3a7ef31 Status:FAILURE URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4981 TriggerType:PUSH TriggerID:67b914a38db41bb391171bf55db63b84a3a7ef31
   Hash:67b914a38db41bb391171bf55db63b84a3a7ef31 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/148107889 TriggerType:PUSH TriggerID:67b914a38db41bb391171bf55db63b84a3a7ef31
   -->
   ## CI report:
   
   * 67b914a38db41bb391171bf55db63b84a3a7ef31 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148107889) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4981) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042#issuecomment-583887563
 
 
   <!--
   Meta data
   Hash:67b914a38db41bb391171bf55db63b84a3a7ef31 Status:UNKNOWN URL:TBD TriggerType:PUSH TriggerID:67b914a38db41bb391171bf55db63b84a3a7ef31
   -->
   ## CI report:
   
   * 67b914a38db41bb391171bf55db63b84a3a7ef31 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] aljoscha commented on a change in pull request #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

Posted by GitBox <gi...@apache.org>.
aljoscha commented on a change in pull request #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042#discussion_r378884891
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java
 ##########
 @@ -952,7 +952,7 @@ private boolean transitionState(ExecutionState currentState, ExecutionState newS
 			if (cause == null) {
 				LOG.info("{} ({}) switched from {} to {}.", taskNameWithSubtask, executionId, currentState, newState);
 			} else {
-				LOG.info("{} ({}) switched from {} to {}.", taskNameWithSubtask, executionId, currentState, newState, cause);
+				LOG.warn("{} ({}) switched from {} to {}.", taskNameWithSubtask, executionId, currentState, newState, cause);
 
 Review comment:
   This will not work because the number of placeholders (`{}`) does not match the number of arguments.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] aljoscha commented on a change in pull request #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

Posted by GitBox <gi...@apache.org>.
aljoscha commented on a change in pull request #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042#discussion_r380175221
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java
 ##########
 @@ -952,7 +952,7 @@ private boolean transitionState(ExecutionState currentState, ExecutionState newS
 			if (cause == null) {
 				LOG.info("{} ({}) switched from {} to {}.", taskNameWithSubtask, executionId, currentState, newState);
 			} else {
-				LOG.info("{} ({}) switched from {} to {}.", taskNameWithSubtask, executionId, currentState, newState, cause);
+				LOG.warn("{} ({}) switched from {} to {}.", taskNameWithSubtask, executionId, currentState, newState, cause);
 
 Review comment:
   Yes, I realized that and wrote my second comment.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] aljoscha commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

Posted by GitBox <gi...@apache.org>.
aljoscha commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042#issuecomment-586942189
 
 
   I think the approach of this PR is also what people on #5399 agreed on.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] aljoscha commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

Posted by GitBox <gi...@apache.org>.
aljoscha commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042#issuecomment-588092905
 
 
   Thanks, I now merged this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info
URL: https://github.com/apache/flink/pull/11042#issuecomment-583884584
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 67b914a38db41bb391171bf55db63b84a3a7ef31 (Sun Feb 09 19:41:32 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
    * **This pull request references an unassigned [Jira ticket](https://issues.apache.org/jira/browse/FLINK-15744).** According to the [code contribution guide](https://flink.apache.org/contributing/contribute-code.html), tickets need to be assigned before starting with the implementation work.
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services