You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Satyam Shekhar <sa...@gmail.com> on 2020/06/28 10:27:27 UTC

Error reporting for Flink jobs

Hello,

I am using Flink as the query engine for running SQL queries on both batch
and streaming data. I use the Blink planner in batch and streaming mode
respectively for the two cases.

In my current setup, I execute the batch queries synchronously via
StreamTableEnvironment::execute method. The job uses OutputFormat to
consume results in StreamTableSink and send it to the user. In case there
is an error/exception in the pipeline (possibly to user code), it is not
reported to OutputFormat or the Sink. If an error occurs after the
invocation of the write method on OutputFormat, the implementation may
falsely assume that the result successful and complete since close is
called in both success and failure cases. I can work around this, by
checking for exceptions thrown by the execute method but that adds extra
latency due to job tear down cost.

A similar problem also exists for streaming jobs. In my setup, streaming
jobs are executed asynchronously via StreamExecuteEnvironment::executeAsync.
Since the sink interface has no methods to receive errors in the pipeline,
the user code has to periodically track and manage persistent failures.

Have I missed something in the API? Or Is there some other way to get
access to error status in user code?

Regards,
Satyam

Re: Error reporting for Flink jobs

Posted by Timo Walther <tw...@apache.org>.
Hi Satyam,

I'm not aware of an API to solve all your problems at once. A common 
pattern for failures in user-code is to catch errors in user-code and 
define a side output for an operator to pipe the errors to dedicated 
sinks. However, such a functionality does not exist in SQL yet. For the 
sink part, it might be useful to look into the StreamingFileSink [1] 
which provides better failure handling guarantees. Flink 1.11 will be 
shipped with a SQL streaming file sink.

Regards,
Timo


[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html


On 28.06.20 12:27, Satyam Shekhar wrote:
> Hello,
> 
> I am using Flink as the query engine for running SQL queries on both 
> batch and streaming data. I use the Blink planner in batch and streaming 
> mode respectively for the two cases.
> 
> In my current setup, I execute the batch queries synchronously via 
> StreamTableEnvironment::execute method. The job uses OutputFormat to 
> consume results in StreamTableSink and send it to the user. In case 
> there is an error/exception in the pipeline (possibly to user code), it 
> is not reported to OutputFormat or the Sink. If an error occurs after 
> the invocation of the write method on OutputFormat, the implementation 
> may falsely assume that the result successful and complete since close 
> is called in both success and failure cases. I can work around this, by 
> checking for exceptions thrown by the execute method but that adds extra 
> latency due to job tear down cost.
> 
> A similar problem also exists for streaming jobs. In my setup, streaming 
> jobs are executed asynchronously via 
> StreamExecuteEnvironment::executeAsync. Since the sink interface has no 
> methods to receive errors in the pipeline, the user code has to 
> periodically track and manage persistent failures.
> 
> Have I missed something in the API? Or Is there some other way to get 
> access to error status in user code?
> 
> Regards,
> Satyam