You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Felix GV <fe...@mate1inc.com> on 2013/08/12 23:48:15 UTC

Re: producer behavior when network is down

Async production is meant to work this way. You have no delivery guarantee
nor any exception because the producer sends the message independently of
the code that called the aync production function.

It is meant to be faster than sync production, but it is obviously intended
for non-critical messages.

--
Felix


On Fri, Jul 26, 2013 at 12:27 PM, Viktor Kolodrevskiy <
viktor.kolodrevskiy@gmail.com> wrote:

> Hey guys,
>
> We decided to use Kafka in our new project, now I spend some time to
> research how Kafka producer behaves while network connectivity
> problems.
>
> I had 3 virtual machines(ubuntu 13.04, running on Virtualbox) in one
> network:
>
> 1. Kafka server(0.7.2) + Zookeper.
> 2. Producer app with default settings.
> 3. Consumer app.
>
> Results of the following tests with default sync producer settings:
>
> 1. Condition: Put network down on machine (1) for 20 mins.
> Result: Producer is working for ~16mins. Consumer does not receive
> anything.
> After ~16mins Producer app fails(with java.io.IOException: Connection
> timed out). Consumer app does not fail.
> Messages that were generated during 16mins are lost!
>
> 2. Condition: Put network down on machine (1) for 5 mins and after 5
> mins start network on (1) again.
> Result: Producer app is working, no exceptions or notification that
> network was down.
> Consumer does not receive messages for 5 mins. But when network on (1)
> is up it receives all messages.
> There are no messages lost.
>
> 3. Condition: put network down on machine (2) for 20 mins.
> Result: Producer is working for ~16mins. Consumer does not receive
> anything.
> After ~16mins Producer app fails(with java.io.IOException: Connection
> timed out). Consumer app does not fail.
> Messages that were generated during 16mins are lost! (Same result as in
> test#1)
> Kafka and Zookeeper logs that producer is disconnected.
>
> 4. Condition: Put network down on machine (2) for 5 mins and after 5
> mins start network on (2) again.
> Result: Producer app is working, no exceptions or notification that
> network was down.
> Consumer does not receive messages for 5 mins. But when network on (2)
> is up it receives all messages.(Same result as in test#2)
> Kafka and Zookeeper logs that producer is disconnected.
>
> 5. Condition: Kill Kafka server(0.7.2) + Zookeper(kill application, do
> not shutdown network).
> Result: Producer fails in a few seconds with
> "kafka.common.NoBrokersForPartitionException: Partition = null"
> Consumer is still working even after 25 minutes.
>
> One more interesting thing. Changing connect.timeout.ms parameter
> value for producer
> did not change 16 mins that I have.
>
> Played with settings and find out the only way to reduce time for
> producer to find out that network is down is to change one of two
> parameters: reconnect.interval, reconnect.time.interval.ms
>
> So lets say we change reconnect.time.interval.ms=1000.
> This means that producer will do reconnect to kafka every 1 second.
> In this case producer find out that network is down in 1 second.
> Producer stops sending messages and throw "java.net.ConnectException:
> Connection timed out". This is the only way that I found out so far.
> In this case we do not loose too much messages but performance may suffer.
> Or we can set reconnect.interval=1 so reconnect will happen after each
> message sent
> and do not loose messages at all.
>
> Then I did testing for Async producer(producer.type=async)
> The results are dramatic for me, coz producer does not throw any exception.
> It sends messages and does not fall.
> I left it running for night and it did not fall though network between
> kafka server and producer app was down.
> Playing with async producer config parameters did not help also.
>
> My questions are:
>
> 1. Where may these 16 mins come from?
> 2. Are there any best practices to handle network down issues?
> 3. Why async producer never throws exceptions when network is down?
> 4. What is the way to check from sync/async producer that messages
> were really sent?
>

Re: producer behavior when network is down

Posted by Viktor Kolodrevskiy <vi...@gmail.com>.
The goal is to use sync producer and find out that network is down as
soon as possible.

--
Viktor

2013/8/13 Viktor Kolodrevskiy <vi...@gmail.com>:
> Felix,
> the thing is that I was using sync producer.
>
> --
> Viktor
>
> 2013/8/13 Felix GV <fe...@mate1inc.com>:
>> Async production is meant to work this way. You have no delivery guarantee
>> nor any exception because the producer sends the message independently of
>> the code that called the aync production function.
>>
>> It is meant to be faster than sync production, but it is obviously intended
>> for non-critical messages.
>>
>> --
>> Felix
>>
>>
>> On Fri, Jul 26, 2013 at 12:27 PM, Viktor Kolodrevskiy <
>> viktor.kolodrevskiy@gmail.com> wrote:
>>
>>> Hey guys,
>>>
>>> We decided to use Kafka in our new project, now I spend some time to
>>> research how Kafka producer behaves while network connectivity
>>> problems.
>>>
>>> I had 3 virtual machines(ubuntu 13.04, running on Virtualbox) in one
>>> network:
>>>
>>> 1. Kafka server(0.7.2) + Zookeper.
>>> 2. Producer app with default settings.
>>> 3. Consumer app.
>>>
>>> Results of the following tests with default sync producer settings:
>>>
>>> 1. Condition: Put network down on machine (1) for 20 mins.
>>> Result: Producer is working for ~16mins. Consumer does not receive
>>> anything.
>>> After ~16mins Producer app fails(with java.io.IOException: Connection
>>> timed out). Consumer app does not fail.
>>> Messages that were generated during 16mins are lost!
>>>
>>> 2. Condition: Put network down on machine (1) for 5 mins and after 5
>>> mins start network on (1) again.
>>> Result: Producer app is working, no exceptions or notification that
>>> network was down.
>>> Consumer does not receive messages for 5 mins. But when network on (1)
>>> is up it receives all messages.
>>> There are no messages lost.
>>>
>>> 3. Condition: put network down on machine (2) for 20 mins.
>>> Result: Producer is working for ~16mins. Consumer does not receive
>>> anything.
>>> After ~16mins Producer app fails(with java.io.IOException: Connection
>>> timed out). Consumer app does not fail.
>>> Messages that were generated during 16mins are lost! (Same result as in
>>> test#1)
>>> Kafka and Zookeeper logs that producer is disconnected.
>>>
>>> 4. Condition: Put network down on machine (2) for 5 mins and after 5
>>> mins start network on (2) again.
>>> Result: Producer app is working, no exceptions or notification that
>>> network was down.
>>> Consumer does not receive messages for 5 mins. But when network on (2)
>>> is up it receives all messages.(Same result as in test#2)
>>> Kafka and Zookeeper logs that producer is disconnected.
>>>
>>> 5. Condition: Kill Kafka server(0.7.2) + Zookeper(kill application, do
>>> not shutdown network).
>>> Result: Producer fails in a few seconds with
>>> "kafka.common.NoBrokersForPartitionException: Partition = null"
>>> Consumer is still working even after 25 minutes.
>>>
>>> One more interesting thing. Changing connect.timeout.ms parameter
>>> value for producer
>>> did not change 16 mins that I have.
>>>
>>> Played with settings and find out the only way to reduce time for
>>> producer to find out that network is down is to change one of two
>>> parameters: reconnect.interval, reconnect.time.interval.ms
>>>
>>> So lets say we change reconnect.time.interval.ms=1000.
>>> This means that producer will do reconnect to kafka every 1 second.
>>> In this case producer find out that network is down in 1 second.
>>> Producer stops sending messages and throw "java.net.ConnectException:
>>> Connection timed out". This is the only way that I found out so far.
>>> In this case we do not loose too much messages but performance may suffer.
>>> Or we can set reconnect.interval=1 so reconnect will happen after each
>>> message sent
>>> and do not loose messages at all.
>>>
>>> Then I did testing for Async producer(producer.type=async)
>>> The results are dramatic for me, coz producer does not throw any exception.
>>> It sends messages and does not fall.
>>> I left it running for night and it did not fall though network between
>>> kafka server and producer app was down.
>>> Playing with async producer config parameters did not help also.
>>>
>>> My questions are:
>>>
>>> 1. Where may these 16 mins come from?
>>> 2. Are there any best practices to handle network down issues?
>>> 3. Why async producer never throws exceptions when network is down?
>>> 4. What is the way to check from sync/async producer that messages
>>> were really sent?
>>>
>
>
>
> --
> Thanks,
> Viktor



-- 
Thanks,
Viktor

Re: producer behavior when network is down

Posted by Viktor Kolodrevskiy <vi...@gmail.com>.
Felix,
the thing is that I was using sync producer.

--
Viktor

2013/8/13 Felix GV <fe...@mate1inc.com>:
> Async production is meant to work this way. You have no delivery guarantee
> nor any exception because the producer sends the message independently of
> the code that called the aync production function.
>
> It is meant to be faster than sync production, but it is obviously intended
> for non-critical messages.
>
> --
> Felix
>
>
> On Fri, Jul 26, 2013 at 12:27 PM, Viktor Kolodrevskiy <
> viktor.kolodrevskiy@gmail.com> wrote:
>
>> Hey guys,
>>
>> We decided to use Kafka in our new project, now I spend some time to
>> research how Kafka producer behaves while network connectivity
>> problems.
>>
>> I had 3 virtual machines(ubuntu 13.04, running on Virtualbox) in one
>> network:
>>
>> 1. Kafka server(0.7.2) + Zookeper.
>> 2. Producer app with default settings.
>> 3. Consumer app.
>>
>> Results of the following tests with default sync producer settings:
>>
>> 1. Condition: Put network down on machine (1) for 20 mins.
>> Result: Producer is working for ~16mins. Consumer does not receive
>> anything.
>> After ~16mins Producer app fails(with java.io.IOException: Connection
>> timed out). Consumer app does not fail.
>> Messages that were generated during 16mins are lost!
>>
>> 2. Condition: Put network down on machine (1) for 5 mins and after 5
>> mins start network on (1) again.
>> Result: Producer app is working, no exceptions or notification that
>> network was down.
>> Consumer does not receive messages for 5 mins. But when network on (1)
>> is up it receives all messages.
>> There are no messages lost.
>>
>> 3. Condition: put network down on machine (2) for 20 mins.
>> Result: Producer is working for ~16mins. Consumer does not receive
>> anything.
>> After ~16mins Producer app fails(with java.io.IOException: Connection
>> timed out). Consumer app does not fail.
>> Messages that were generated during 16mins are lost! (Same result as in
>> test#1)
>> Kafka and Zookeeper logs that producer is disconnected.
>>
>> 4. Condition: Put network down on machine (2) for 5 mins and after 5
>> mins start network on (2) again.
>> Result: Producer app is working, no exceptions or notification that
>> network was down.
>> Consumer does not receive messages for 5 mins. But when network on (2)
>> is up it receives all messages.(Same result as in test#2)
>> Kafka and Zookeeper logs that producer is disconnected.
>>
>> 5. Condition: Kill Kafka server(0.7.2) + Zookeper(kill application, do
>> not shutdown network).
>> Result: Producer fails in a few seconds with
>> "kafka.common.NoBrokersForPartitionException: Partition = null"
>> Consumer is still working even after 25 minutes.
>>
>> One more interesting thing. Changing connect.timeout.ms parameter
>> value for producer
>> did not change 16 mins that I have.
>>
>> Played with settings and find out the only way to reduce time for
>> producer to find out that network is down is to change one of two
>> parameters: reconnect.interval, reconnect.time.interval.ms
>>
>> So lets say we change reconnect.time.interval.ms=1000.
>> This means that producer will do reconnect to kafka every 1 second.
>> In this case producer find out that network is down in 1 second.
>> Producer stops sending messages and throw "java.net.ConnectException:
>> Connection timed out". This is the only way that I found out so far.
>> In this case we do not loose too much messages but performance may suffer.
>> Or we can set reconnect.interval=1 so reconnect will happen after each
>> message sent
>> and do not loose messages at all.
>>
>> Then I did testing for Async producer(producer.type=async)
>> The results are dramatic for me, coz producer does not throw any exception.
>> It sends messages and does not fall.
>> I left it running for night and it did not fall though network between
>> kafka server and producer app was down.
>> Playing with async producer config parameters did not help also.
>>
>> My questions are:
>>
>> 1. Where may these 16 mins come from?
>> 2. Are there any best practices to handle network down issues?
>> 3. Why async producer never throws exceptions when network is down?
>> 4. What is the way to check from sync/async producer that messages
>> were really sent?
>>



-- 
Thanks,
Viktor