You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "SUBRAMANYA SURESH (JIRA)" <ji...@apache.org> on 2018/05/29 19:16:00 UTC

[jira] [Updated] (FLINK-9471) Job ending exceptions being logged at Info level

     [ https://issues.apache.org/jira/browse/FLINK-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

SUBRAMANYA SURESH updated FLINK-9471:
-------------------------------------
    Description: 
We are using Flink SQL, I see job ending logs that are logged at info level, that makes it very hard for me to tune out the Info messages in the configuration. Note: If I do end up using Info, the same executionGraph logs the entire query for the operationGraph for every info statement, and this fills up the logs easily if we have say 100-200 queries. 

Note the "-" below indicate an entire line of execution graph for this query (redacted for privacy). 

018-03-30 03:32:09,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Custom Source -> - 

- 

-

-

- (208/725).}

        at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:948)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator Source: Custom Source -> (- 

-

-

-

-

) (208/725).

        ... 6 more

Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Could not flush and close the file system output stream to hdfs://security-temp/savedSearches/checkpoint/561eb649376bef2f2d8daa1e3a0fa6db/chk-1/067924e4-c861-4de1-823e-b255a0bf9998 in order to obtain the stream state handle

        at java.util.concurrent.FutureTask.report(FutureTask.java:122)

        at java.util.concurrent.FutureTask.get(FutureTask.java:192)

        at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)

        at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:892)

 

  was:
We are using Flink SQL, I see job ending logs that are logged at info level, that makes it very hard for me to tune out the Info messages in the configuration. Note: If I do end up using Info, the same executionGraph logs the entire query for the operationGraph for every info statement, and this fills up the logs easily if we have say 100-200 queries. 

Note the "-" below indicate an entire line of execution graph for this query (redacted for privacy). 



2018-03-30 03:32:09,943 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Discarding checkpoint 1 because: java.lang.Exception: Could not materialize checkpoint 1 for operator Source: Custom Source -> (Map -> where: (AND(=- 

- 

- 

- 

-.}

at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:948)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator Source: Custom Source -> (Map -> where: (AND(=(Environment, _UTF-16LE'SFDC-IT'), =(RuleMatch, _UTF-16LE'SFA'), =(LogType, _UTF-16LE'SAML-AUTH'), =(Outcome, _UTF-16LE'DENY'))), select: (proctime, CAST(_UTF-16LE'SFDC-IT') AS Environment, CollectedTimestamp, EventTimestamp, _raw, Aggregator), Map -> where: (AND(=(Environment, _UTF-16LE'SFDC-IT'), =(RuleMatch, _UTF-16LE'SFA'), =(LogType, _UTF-16LE'SAML-AUTH'), =(Outcome, _UTF-16LE'DENY'))), select: (proctime, CAST(_UTF-16LE'SFDC-IT') AS Environment, CollectedTimestamp, EventTimestamp, _raw, Aggregator)) (353/725).

... 6 more

Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Could not flush and close the file system output stream to hdfs://security-temp/savedSearches/checkpoint/561eb649376bef2f2d8daa1e3a0fa6db/chk-1/31b94717-9e6d-49b8-b64d-2a1a8ba04425 in order to obtain the stream state handle

at java.util.concurrent.FutureTask.report(FutureTask.java:122)

at java.util.concurrent.FutureTask.get(FutureTask.java:192)

at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)

at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:892)

... 5 more

Suppressed: java.lang.Exception: Could not properly cancel managed operator state future.

at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:99)

at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:976)

at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939)

... 5 more


> Job ending exceptions being logged at Info level
> ------------------------------------------------
>
>                 Key: FLINK-9471
>                 URL: https://issues.apache.org/jira/browse/FLINK-9471
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.4.2
>            Reporter: SUBRAMANYA SURESH
>            Priority: Major
>
> We are using Flink SQL, I see job ending logs that are logged at info level, that makes it very hard for me to tune out the Info messages in the configuration. Note: If I do end up using Info, the same executionGraph logs the entire query for the operationGraph for every info statement, and this fills up the logs easily if we have say 100-200 queries. 
> Note the "-" below indicate an entire line of execution graph for this query (redacted for privacy). 
> 018-03-30 03:32:09,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Custom Source -> - 
> - 
> -
> -
> - (208/725).}
>         at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:948)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator Source: Custom Source -> (- 
> -
> -
> -
> -
> ) (208/725).
>         ... 6 more
> Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Could not flush and close the file system output stream to hdfs://security-temp/savedSearches/checkpoint/561eb649376bef2f2d8daa1e3a0fa6db/chk-1/067924e4-c861-4de1-823e-b255a0bf9998 in order to obtain the stream state handle
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>         at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
>         at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:892)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)