You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Dominik Safaric <do...@gmail.com> on 2016/12/12 18:54:38 UTC

Flink 1.1.3 RollingSink - mismatch in the number of records consumed/produced

Hi everyone,

As I’ve implemented a RollingSink writing messages consumed from a Kafka log, I’ve observed that there is a significant mismatch in the number of messages consumed and written to file system.

Namely, the consumed Kafka topic contains in total 1.000.000 messages. The topology does not perform any data transformation whatsoever, but instead of, data from the source is pushed straight to the RollingSink. 

After I’ve checksummed the output files, I’ve observed that the total number of messages written to the output files is greater then 7.000.000 - a different of 6.000.000 records more then consumed/available.

What is the cause of this behaviour? 

Regards,
Dominik   

Re: Flink 1.1.3 RollingSink - mismatch in the number of records consumed/produced

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,
how did you execute this? Did any failures occur in between? If yes, it can
be that the sink writes stuff multiple times but marks the valid contents
of files using a .valid-length file.

Cheers,
Aljoscha

On Mon, 12 Dec 2016 at 19:54 Dominik Safaric <do...@gmail.com>
wrote:

> Hi everyone,
>
> As I’ve implemented a RollingSink writing messages consumed from a Kafka
> log, I’ve observed that there is a significant mismatch in the number of
> messages consumed and written to file system.
>
> Namely, the consumed Kafka topic contains in total 1.000.000 messages. The
> topology does not perform any data transformation whatsoever, but instead
> of, data from the source is pushed straight to the RollingSink.
>
> After I’ve checksummed the output files, I’ve observed that the total
> number of messages written to the output files is greater then 7.000.000 -
> a different of 6.000.000 records more then consumed/available.
>
> What is the cause of this behaviour?
>
> Regards,
> Dominik