You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Maung Than <ma...@apple.com> on 2014/06/03 23:13:45 UTC

Data loss detection

Hi, 

We are seeing less data on the brokers than we send form the producers:  84 GB to 58 GB. 

What is the best way to ensure / detect if all data has been send properly to the brokers from the producers. 

Is there any logs that we can check on the producers? 

Configuration is 5 Brokers, 2 producers, no replication factor, async and ask is 1 and no compression. 

Thanks,
Maung

Re: Data loss detection

Posted by Maung Than <ma...@apple.com>.
Yes. We did..some output of it..

2014-06-03 21:46:09 INFO  Producer:68 - Shutting down producer
2014-06-03 21:46:09 INFO  ProducerSendThread:68 - Begin shutting down ProducerSendThread
2014-06-03 21:46:09 INFO  ProducerSendThread:68 - Shutdown ProducerSendThread complete
2014-06-03 21:46:09 INFO  ProducerPool:68 - Closing all sync producers


On Jun 3, 2014, at 9:58 PM, Timothy Chen <tn...@gmail.com> wrote:

> By the way if you're using async producer how do you verify that you
> sent all the data from the producer?
> 
> Do you shutdown the producer before you check?
> 
> Tim
> 
> On Tue, Jun 3, 2014 at 3:27 PM, Maung Than <ma...@apple.com> wrote:
>> Thanks, Tim.
>> 
>> We are just trying to benchmark the kafka producers and there is no issue with cluster or brokers being down in this case.
>> 
>> We are seeing way less data on the borers after calculating the sizes of the logs on the brokers) and there is no compression.
>> 
>> We send 84 GB, but total logs sizes are only 58 GB on the brokers.
>> 
>> Since replication factor is zero, can we use ack other than 1?
>> 
>> Maung
>> 
>> On Jun 3, 2014, at 3:00 PM, Timothy Chen <tn...@gmail.com> wrote:
>> 
>>> Hi Maung,
>>> 
>>> If your required.acks is 1 then the producer only ensures that one
>>> broker receives the data before it's sucessfully returned to the
>>> client.
>>> 
>>> Therefore if the broker crashes and lost all the data then you lose
>>> data, or similarly it can happen even before the data is fsynced.
>>> 
>>> To ensure there are more copies of your data in case of failure
>>> scenarios you want to increase your required.acks to more than 1 to
>>> tolerate failuries.
>>> 
>>> Also async producer doesn't wait until the data is sent before it
>>> returns, as it buffers and writes asynchronously. To ensure each write
>>> that has a succesful response is written you want to use the sync
>>> producer.
>>> 
>>> Tim
>>> 
>>> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than <ma...@apple.com> wrote:
>>>> Hi,
>>>> 
>>>> We are seeing less data on the brokers than we send form the producers:  84 GB to 58 GB.
>>>> 
>>>> What is the best way to ensure / detect if all data has been send properly to the brokers from the producers.
>>>> 
>>>> Is there any logs that we can check on the producers?
>>>> 
>>>> Configuration is 5 Brokers, 2 producers, no replication factor, async and ask is 1 and no compression.
>>>> 
>>>> Thanks,
>>>> Maung
>> 


Re: Data loss detection

Posted by Timothy Chen <tn...@gmail.com>.
By the way if you're using async producer how do you verify that you
sent all the data from the producer?

Do you shutdown the producer before you check?

Tim

On Tue, Jun 3, 2014 at 3:27 PM, Maung Than <ma...@apple.com> wrote:
> Thanks, Tim.
>
> We are just trying to benchmark the kafka producers and there is no issue with cluster or brokers being down in this case.
>
> We are seeing way less data on the borers after calculating the sizes of the logs on the brokers) and there is no compression.
>
> We send 84 GB, but total logs sizes are only 58 GB on the brokers.
>
> Since replication factor is zero, can we use ack other than 1?
>
> Maung
>
> On Jun 3, 2014, at 3:00 PM, Timothy Chen <tn...@gmail.com> wrote:
>
>> Hi Maung,
>>
>> If your required.acks is 1 then the producer only ensures that one
>> broker receives the data before it's sucessfully returned to the
>> client.
>>
>> Therefore if the broker crashes and lost all the data then you lose
>> data, or similarly it can happen even before the data is fsynced.
>>
>> To ensure there are more copies of your data in case of failure
>> scenarios you want to increase your required.acks to more than 1 to
>> tolerate failuries.
>>
>> Also async producer doesn't wait until the data is sent before it
>> returns, as it buffers and writes asynchronously. To ensure each write
>> that has a succesful response is written you want to use the sync
>> producer.
>>
>> Tim
>>
>> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than <ma...@apple.com> wrote:
>>> Hi,
>>>
>>> We are seeing less data on the brokers than we send form the producers:  84 GB to 58 GB.
>>>
>>> What is the best way to ensure / detect if all data has been send properly to the brokers from the producers.
>>>
>>> Is there any logs that we can check on the producers?
>>>
>>> Configuration is 5 Brokers, 2 producers, no replication factor, async and ask is 1 and no compression.
>>>
>>> Thanks,
>>> Maung
>

Re: Data loss detection

Posted by Maung Than <ma...@apple.com>.
Thanks, Tim. 

We are just trying to benchmark the kafka producers and there is no issue with cluster or brokers being down in this case. 

We are seeing way less data on the borers after calculating the sizes of the logs on the brokers) and there is no compression. 

We send 84 GB, but total logs sizes are only 58 GB on the brokers. 

Since replication factor is zero, can we use ack other than 1?  

Maung 

On Jun 3, 2014, at 3:00 PM, Timothy Chen <tn...@gmail.com> wrote:

> Hi Maung,
> 
> If your required.acks is 1 then the producer only ensures that one
> broker receives the data before it's sucessfully returned to the
> client.
> 
> Therefore if the broker crashes and lost all the data then you lose
> data, or similarly it can happen even before the data is fsynced.
> 
> To ensure there are more copies of your data in case of failure
> scenarios you want to increase your required.acks to more than 1 to
> tolerate failuries.
> 
> Also async producer doesn't wait until the data is sent before it
> returns, as it buffers and writes asynchronously. To ensure each write
> that has a succesful response is written you want to use the sync
> producer.
> 
> Tim
> 
> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than <ma...@apple.com> wrote:
>> Hi,
>> 
>> We are seeing less data on the brokers than we send form the producers:  84 GB to 58 GB.
>> 
>> What is the best way to ensure / detect if all data has been send properly to the brokers from the producers.
>> 
>> Is there any logs that we can check on the producers?
>> 
>> Configuration is 5 Brokers, 2 producers, no replication factor, async and ask is 1 and no compression.
>> 
>> Thanks,
>> Maung


Re: Data loss detection

Posted by Timothy Chen <tn...@gmail.com>.
Hi Maung,

If your required.acks is 1 then the producer only ensures that one
broker receives the data before it's sucessfully returned to the
client.

Therefore if the broker crashes and lost all the data then you lose
data, or similarly it can happen even before the data is fsynced.

To ensure there are more copies of your data in case of failure
scenarios you want to increase your required.acks to more than 1 to
tolerate failuries.

Also async producer doesn't wait until the data is sent before it
returns, as it buffers and writes asynchronously. To ensure each write
that has a succesful response is written you want to use the sync
producer.

Tim

On Tue, Jun 3, 2014 at 2:13 PM, Maung Than <ma...@apple.com> wrote:
> Hi,
>
> We are seeing less data on the brokers than we send form the producers:  84 GB to 58 GB.
>
> What is the best way to ensure / detect if all data has been send properly to the brokers from the producers.
>
> Is there any logs that we can check on the producers?
>
> Configuration is 5 Brokers, 2 producers, no replication factor, async and ask is 1 and no compression.
>
> Thanks,
> Maung

Re: Data loss detection

Posted by Jun Rao <ju...@gmail.com>.
It should be sth like clientId-MessagesPerSec.

Thanks,

Jun


On Wed, Jun 4, 2014 at 9:35 AM, Maung Than <ma...@apple.com> wrote:

>
> We could not find producer msg rate from the matrices in the JConsole —
> give us some pointers.
>
> Also confirming that the reduction in data is due to Avro encoding and we
> are calculating what we send to producer rather than the output of
> serializer encoder.
>
> Thanks,
> Maung
>
> On Jun 3, 2014, at 10:47 PM, Maung Than <ma...@apple.com> wrote:
>
> > Thanks, Jun.
> >
> > Will check and get back..
> >
> > We are converting JSON to Avro and that conversion is done by the custom
> serializer.
> >
> > Our volume calculation on the producer side is based on the AVRO generic
> record that is passed to the producer send method— not of the encoded data
> output from the serializer that is what actual got send the Broker I
> believe.
> >
> > That could be the gap and I am testing now without the customer
> serializer and seeing the two volumes are very close.  That could be it!!
> >
> > Thanks,
> > Maung
> >
> > On Jun 3, 2014, at 7:22 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> >> We have a metric on msg rate in both the producer and the broker. Could
> you
> >> see if they match?
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >>
> >> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than <ma...@apple.com>
> wrote:
> >>
> >>> Hi,
> >>>
> >>> We are seeing less data on the brokers than we send form the producers:
> >>> 84 GB to 58 GB.
> >>>
> >>> What is the best way to ensure / detect if all data has been send
> properly
> >>> to the brokers from the producers.
> >>>
> >>> Is there any logs that we can check on the producers?
> >>>
> >>> Configuration is 5 Brokers, 2 producers, no replication factor, async
> and
> >>> ask is 1 and no compression.
> >>>
> >>> Thanks,
> >>> Maung
> >>>
> >
>
>

Re: Data loss detection

Posted by Maung Than <ma...@apple.com>.
We could not find producer msg rate from the matrices in the JConsole — give us some pointers. 

Also confirming that the reduction in data is due to Avro encoding and we are calculating what we send to producer rather than the output of serializer encoder. 

Thanks,
Maung

On Jun 3, 2014, at 10:47 PM, Maung Than <ma...@apple.com> wrote:

> Thanks, Jun. 
> 
> Will check and get back..
> 
> We are converting JSON to Avro and that conversion is done by the custom serializer. 
> 
> Our volume calculation on the producer side is based on the AVRO generic record that is passed to the producer send method— not of the encoded data output from the serializer that is what actual got send the Broker I believe. 
> 
> That could be the gap and I am testing now without the customer serializer and seeing the two volumes are very close.  That could be it!!
> 
> Thanks,
> Maung
> 
> On Jun 3, 2014, at 7:22 PM, Jun Rao <ju...@gmail.com> wrote:
> 
>> We have a metric on msg rate in both the producer and the broker. Could you
>> see if they match?
>> 
>> Thanks,
>> 
>> Jun
>> 
>> 
>> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than <ma...@apple.com> wrote:
>> 
>>> Hi,
>>> 
>>> We are seeing less data on the brokers than we send form the producers:
>>> 84 GB to 58 GB.
>>> 
>>> What is the best way to ensure / detect if all data has been send properly
>>> to the brokers from the producers.
>>> 
>>> Is there any logs that we can check on the producers?
>>> 
>>> Configuration is 5 Brokers, 2 producers, no replication factor, async and
>>> ask is 1 and no compression.
>>> 
>>> Thanks,
>>> Maung
>>> 
> 


Re: Data loss detection

Posted by Maung Than <ma...@apple.com>.
Thanks, Jun. 

Will check and get back..

We are converting JSON to Avro and that conversion is done by the custom serializer. 

Our volume calculation on the producer side is based on the AVRO generic record that is passed to the producer send method— not of the encoded data output from the serializer that is what actual got send the Broker I believe. 

That could be the gap and I am testing now without the customer serializer and seeing the two volumes are very close.  That could be it!!

Thanks,
Maung

On Jun 3, 2014, at 7:22 PM, Jun Rao <ju...@gmail.com> wrote:

> We have a metric on msg rate in both the producer and the broker. Could you
> see if they match?
> 
> Thanks,
> 
> Jun
> 
> 
> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than <ma...@apple.com> wrote:
> 
>> Hi,
>> 
>> We are seeing less data on the brokers than we send form the producers:
>> 84 GB to 58 GB.
>> 
>> What is the best way to ensure / detect if all data has been send properly
>> to the brokers from the producers.
>> 
>> Is there any logs that we can check on the producers?
>> 
>> Configuration is 5 Brokers, 2 producers, no replication factor, async and
>> ask is 1 and no compression.
>> 
>> Thanks,
>> Maung
>> 


Re: Data loss detection

Posted by Jun Rao <ju...@gmail.com>.
We have a metric on msg rate in both the producer and the broker. Could you
see if they match?

Thanks,

Jun


On Tue, Jun 3, 2014 at 2:13 PM, Maung Than <ma...@apple.com> wrote:

> Hi,
>
> We are seeing less data on the brokers than we send form the producers:
>  84 GB to 58 GB.
>
> What is the best way to ensure / detect if all data has been send properly
> to the brokers from the producers.
>
> Is there any logs that we can check on the producers?
>
> Configuration is 5 Brokers, 2 producers, no replication factor, async and
> ask is 1 and no compression.
>
> Thanks,
> Maung
>