You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Chirag Dewan <ch...@yahoo.in> on 2018/03/12 07:29:29 UTC

Record Delivery Guarantee with Kafka 1.0.0

Hi,
I am trying to use Kafka Sink 0.11 with ATLEAST_ONCE semantic and experiencing some data loss on Task Manager failure.
Its a simple job with parallelism=1 and a single Task Manager. After a few checkpoints(kafka flush's) i kill one of my Task Manager running as a container on Docker Swarm. 
I observe a small number of records, usually 4-5, being lost on Kafka broker(1 broker cluster, 1 topic with 1 partition).
My FlinkKafkaProducer config are as follows : 
batch.size=default(16384)retries=3max.in.flight.requests.per.connection=1acks=1
As I understand it, all the messages batched by KafkaProducer(RecordAccumulator) in the memory-buffer, are lost. Is this why I cant see my records on the broker? Or is there something I am doing terribly wrong? Any help appreciated.
TIA,
Chirag

Re: Record Delivery Guarantee with Kafka 1.0.0

Posted by Stephan Ewen <se...@apache.org>.

This should be handled by Flink. The system does flush records on
checkpoints and does not confirm a checkpoint before all flushes are acked
back.

Did you turn on checkpointing? Without that, Flink cannot give guarantees
for exactly the reason you outlined above.



On Wed, Mar 14, 2018 at 9:34 PM, Aljoscha Krettek <al...@apache.org>
wrote:

> Hi,
>
> How are you checking that records are missing? Flink should flush to Kafka
> and wait for all records to be flushed when performing a checkpoint.
>
> Best,
> Aljoscha
>
> On 13. Mar 2018, at 21:31, Chirag Dewan <ch...@yahoo.in> wrote:
>
> Hi,
>
> Still stuck around this.
>
> My understanding is, this is something Flink can't handle. If the
> batch-size of Kafka Producer is non zero(which ideally should be), there
> will be in-memory records and data loss(boundary cases). Only way I can
> handle this with Flink is my checkpointing interval, which flushes any
> buffered records.
>
> Is my understanding correct here? Or am I still missing something?
>
> thanks,
>
> Chirag
>
> On Monday, 12 March, 2018, 12:59:51 PM IST, Chirag Dewan <
> chirag.dewan22@yahoo.in> wrote:
>
>
> Hi,
>
> I am trying to use Kafka Sink 0.11 with ATLEAST_ONCE semantic and
> experiencing some data loss on Task Manager failure.
>
> Its a simple job with parallelism=1 and a single Task Manager. After a few
> checkpoints(kafka flush's) i kill one of my Task Manager running as a
> container on Docker Swarm.
>
> I observe a small number of records, usually 4-5, being lost on Kafka
> broker(1 broker cluster, 1 topic with 1 partition).
>
> My FlinkKafkaProducer config are as follows :
>
> batch.size=default(16384)
> retries=3
> max.in.flight.requests.per.connection=1
> acks=1
>
> As I understand it, all the messages batched by KafkaProducer(RecordAccumulator)
> in the memory-buffer, are lost. Is this why I cant see my records on the
> broker? Or is there something I am doing terribly wrong? Any help
> appreciated.
>
> TIA,
>
> Chirag
>
>
>
>
>
>

Re: Record Delivery Guarantee with Kafka 1.0.0

Posted by Aljoscha Krettek <al...@apache.org>.

Hi,

How are you checking that records are missing? Flink should flush to Kafka and wait for all records to be flushed when performing a checkpoint.

Best,
Aljoscha

> On 13. Mar 2018, at 21:31, Chirag Dewan <ch...@yahoo.in> wrote:
> 
> Hi,
> 
> Still stuck around this. 
> 
> My understanding is, this is something Flink can't handle. If the batch-size of Kafka Producer is non zero(which ideally should be), there will be in-memory records and data loss(boundary cases). Only way I can handle this with Flink is my checkpointing interval, which flushes any buffered records. 
> 
> Is my understanding correct here? Or am I still missing something?  
> 
> thanks,
> 
> Chirag  
> 
> On Monday, 12 March, 2018, 12:59:51 PM IST, Chirag Dewan <ch...@yahoo.in> wrote:
> 
> 
> Hi,
> 
> I am trying to use Kafka Sink 0.11 with ATLEAST_ONCE semantic and experiencing some data loss on Task Manager failure.
> 
> Its a simple job with parallelism=1 and a single Task Manager. After a few checkpoints(kafka flush's) i kill one of my Task Manager running as a container on Docker Swarm. 
> 
> I observe a small number of records, usually 4-5, being lost on Kafka broker(1 broker cluster, 1 topic with 1 partition).
> 
> My FlinkKafkaProducer config are as follows : 
> 
> batch.size=default(16384)
> retries=3
> max.in.flight.requests.per.connection=1
> acks=1
> 
> As I understand it, all the messages batched by KafkaProducer(RecordAccumulator) in the memory-buffer, are lost. Is this why I cant see my records on the broker? Or is there something I am doing terribly wrong? Any help appreciated.
> 
> TIA,
> 
> Chirag
> 
>  
>

Re: Record Delivery Guarantee with Kafka 1.0.0

Posted by Chirag Dewan <ch...@yahoo.in>.

 Hi,
Still stuck around this. 
My understanding is, this is something Flink can't handle. If the batch-size of Kafka Producer is non zero(which ideally should be), there will be in-memory records and data loss(boundary cases). Only way I can handle this with Flink is my checkpointing interval, which flushes any buffered records. 
Is my understanding correct here? Or am I still missing something?  
thanks,
Chirag  
    On Monday, 12 March, 2018, 12:59:51 PM IST, Chirag Dewan <ch...@yahoo.in> wrote:  
 
 Hi,
I am trying to use Kafka Sink 0.11 with ATLEAST_ONCE semantic and experiencing some data loss on Task Manager failure.
Its a simple job with parallelism=1 and a single Task Manager. After a few checkpoints(kafka flush's) i kill one of my Task Manager running as a container on Docker Swarm. 
I observe a small number of records, usually 4-5, being lost on Kafka broker(1 broker cluster, 1 topic with 1 partition).
My FlinkKafkaProducer config are as follows : 
batch.size=default(16384)retries=3max.in.flight.requests.per.connection=1acks=1
As I understand it, all the messages batched by KafkaProducer(RecordAccumulator) in the memory-buffer, are lost. Is this why I cant see my records on the broker? Or is there something I am doing terribly wrong? Any help appreciated.
TIA,
Chirag