You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Rahul Jain <ra...@gmail.com> on 2015/09/01 11:24:38 UTC

Re: Question regarding to reconnect.backoff.ms

We did notice something similar. When a broker node (out of 3) went down,
metadata calls continued to go to the failed node and producer kept
failing. We were able to make it work by increasing the reconnect.backoff.ms
to 1 second.

Something similar was discussed earlier -
http://qnalist.com/questions/6002514/new-producer-metadata-update-problem-on-2-node-cluster



On Mon, Aug 31, 2015 at 11:00 PM, Steve Tian <st...@gmail.com>
wrote:

> Hi everyone,
>
> Is there any concerns to have a long reconnect.backoff.ms for new java
> Kafka producer (0.8.2.0/0.8.2.1)?
>
> Assuming we have bootstrap.servers=host1:port1,host2:port2,host3:port3 and
> host1 is *down* in the very beginning. If a newly created Kafka producer
> decide to choose host1 as first node to connect for metadata update, then
> that producer will keep trying on host1 *only* as default tcp timeout is
> surely longer than default value of reconnect.backoff.ms, which is 10 ms.
>
> I am thinking to have reconnect.backoff.ms longer than N * T where N is
> the
> number of nodes in bootstrap.servers and T is the default tcp timeout.  Is
> there any concerns to have a long reconnect.backoff.ms like that?  Any
> better solutions?
>
> Cheers, Steve
>

Re: Question regarding to reconnect.backoff.ms

Posted by Steve Tian <st...@gmail.com>.
Got it. Thanks a lot Ewen!

Cheers, Steve

On Thu, Sep 3, 2015, 10:06 AM Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> Steve,
>
> I don't think there is a better solution at the moment. This is an easy
> issue to miss in unit testing because generally connections to localhost
> will be rejected immediately if there isn't anything listening on the port.
> If you're running in an environment where this happens normally, then for
> now you'll need to wait for the long timeout.
>
> https://issues.apache.org/jira/browse/KAFKA-2120 may also alleviate the
> problem by at least reducing the amount of time for the request to fail.
> Depending on how adventurous you are, you could try using a version with
> that patch and maybe adjust the setting lower than its default.
>
> -Ewen
>
> On Wed, Sep 2, 2015 at 10:46 AM, Steve Tian <st...@gmail.com>
> wrote:
>
> > Would kafka dev kindly give us some advice on this?
> >
> > Cheers, Steve
> >
> > On Tue, Sep 1, 2015, 11:20 PM Steve Tian <st...@gmail.com>
> wrote:
> >
> > > Thanks, Rahul!  In my environment I need to have reconnect.backoff.ms
> > > longer than OS default tcp timeout so that NetworkClient can give
> second
> > > node a try.
> > >
> > > I believe this is related to
> > > https://issues.apache.org/jira/browse/KAFKA-2459 .
> > >
> > > Cheers, Steve
> > >
> > > On Tue, Sep 1, 2015, 5:24 PM Rahul Jain <ra...@gmail.com> wrote:
> > >
> > >> We did notice something similar. When a broker node (out of 3) went
> > down,
> > >> metadata calls continued to go to the failed node and producer kept
> > >> failing. We were able to make it work by increasing the
> > >> reconnect.backoff.ms
> > >> to 1 second.
> > >>
> > >> Something similar was discussed earlier -
> > >>
> > >>
> >
> http://qnalist.com/questions/6002514/new-producer-metadata-update-problem-on-2-node-cluster
> > >>
> > >>
> > >>
> > >> On Mon, Aug 31, 2015 at 11:00 PM, Steve Tian <steve.cs.tian@gmail.com
> >
> > >> wrote:
> > >>
> > >> > Hi everyone,
> > >> >
> > >> > Is there any concerns to have a long reconnect.backoff.ms for new
> > java
> > >> > Kafka producer (0.8.2.0/0.8.2.1)?
> > >> >
> > >> > Assuming we have
> bootstrap.servers=host1:port1,host2:port2,host3:port3
> > >> and
> > >> > host1 is *down* in the very beginning. If a newly created Kafka
> > producer
> > >> > decide to choose host1 as first node to connect for metadata update,
> > >> then
> > >> > that producer will keep trying on host1 *only* as default tcp
> timeout
> > is
> > >> > surely longer than default value of reconnect.backoff.ms, which is
> 10
> > >> ms.
> > >> >
> > >> > I am thinking to have reconnect.backoff.ms longer than N * T where
> N
> > is
> > >> > the
> > >> > number of nodes in bootstrap.servers and T is the default tcp
> timeout.
> > >> Is
> > >> > there any concerns to have a long reconnect.backoff.ms like that?
> > Any
> > >> > better solutions?
> > >> >
> > >> > Cheers, Steve
> > >> >
> > >>
> > >
> >
>
>
>
> --
> Thanks,
> Ewen
>

Re: Question regarding to reconnect.backoff.ms

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
Steve,

I don't think there is a better solution at the moment. This is an easy
issue to miss in unit testing because generally connections to localhost
will be rejected immediately if there isn't anything listening on the port.
If you're running in an environment where this happens normally, then for
now you'll need to wait for the long timeout.

https://issues.apache.org/jira/browse/KAFKA-2120 may also alleviate the
problem by at least reducing the amount of time for the request to fail.
Depending on how adventurous you are, you could try using a version with
that patch and maybe adjust the setting lower than its default.

-Ewen

On Wed, Sep 2, 2015 at 10:46 AM, Steve Tian <st...@gmail.com> wrote:

> Would kafka dev kindly give us some advice on this?
>
> Cheers, Steve
>
> On Tue, Sep 1, 2015, 11:20 PM Steve Tian <st...@gmail.com> wrote:
>
> > Thanks, Rahul!  In my environment I need to have reconnect.backoff.ms
> > longer than OS default tcp timeout so that NetworkClient can give second
> > node a try.
> >
> > I believe this is related to
> > https://issues.apache.org/jira/browse/KAFKA-2459 .
> >
> > Cheers, Steve
> >
> > On Tue, Sep 1, 2015, 5:24 PM Rahul Jain <ra...@gmail.com> wrote:
> >
> >> We did notice something similar. When a broker node (out of 3) went
> down,
> >> metadata calls continued to go to the failed node and producer kept
> >> failing. We were able to make it work by increasing the
> >> reconnect.backoff.ms
> >> to 1 second.
> >>
> >> Something similar was discussed earlier -
> >>
> >>
> http://qnalist.com/questions/6002514/new-producer-metadata-update-problem-on-2-node-cluster
> >>
> >>
> >>
> >> On Mon, Aug 31, 2015 at 11:00 PM, Steve Tian <st...@gmail.com>
> >> wrote:
> >>
> >> > Hi everyone,
> >> >
> >> > Is there any concerns to have a long reconnect.backoff.ms for new
> java
> >> > Kafka producer (0.8.2.0/0.8.2.1)?
> >> >
> >> > Assuming we have bootstrap.servers=host1:port1,host2:port2,host3:port3
> >> and
> >> > host1 is *down* in the very beginning. If a newly created Kafka
> producer
> >> > decide to choose host1 as first node to connect for metadata update,
> >> then
> >> > that producer will keep trying on host1 *only* as default tcp timeout
> is
> >> > surely longer than default value of reconnect.backoff.ms, which is 10
> >> ms.
> >> >
> >> > I am thinking to have reconnect.backoff.ms longer than N * T where N
> is
> >> > the
> >> > number of nodes in bootstrap.servers and T is the default tcp timeout.
> >> Is
> >> > there any concerns to have a long reconnect.backoff.ms like that?
> Any
> >> > better solutions?
> >> >
> >> > Cheers, Steve
> >> >
> >>
> >
>



-- 
Thanks,
Ewen

Re: Question regarding to reconnect.backoff.ms

Posted by Steve Tian <st...@gmail.com>.
Would kafka dev kindly give us some advice on this?

Cheers, Steve

On Tue, Sep 1, 2015, 11:20 PM Steve Tian <st...@gmail.com> wrote:

> Thanks, Rahul!  In my environment I need to have reconnect.backoff.ms
> longer than OS default tcp timeout so that NetworkClient can give second
> node a try.
>
> I believe this is related to
> https://issues.apache.org/jira/browse/KAFKA-2459 .
>
> Cheers, Steve
>
> On Tue, Sep 1, 2015, 5:24 PM Rahul Jain <ra...@gmail.com> wrote:
>
>> We did notice something similar. When a broker node (out of 3) went down,
>> metadata calls continued to go to the failed node and producer kept
>> failing. We were able to make it work by increasing the
>> reconnect.backoff.ms
>> to 1 second.
>>
>> Something similar was discussed earlier -
>>
>> http://qnalist.com/questions/6002514/new-producer-metadata-update-problem-on-2-node-cluster
>>
>>
>>
>> On Mon, Aug 31, 2015 at 11:00 PM, Steve Tian <st...@gmail.com>
>> wrote:
>>
>> > Hi everyone,
>> >
>> > Is there any concerns to have a long reconnect.backoff.ms for new java
>> > Kafka producer (0.8.2.0/0.8.2.1)?
>> >
>> > Assuming we have bootstrap.servers=host1:port1,host2:port2,host3:port3
>> and
>> > host1 is *down* in the very beginning. If a newly created Kafka producer
>> > decide to choose host1 as first node to connect for metadata update,
>> then
>> > that producer will keep trying on host1 *only* as default tcp timeout is
>> > surely longer than default value of reconnect.backoff.ms, which is 10
>> ms.
>> >
>> > I am thinking to have reconnect.backoff.ms longer than N * T where N is
>> > the
>> > number of nodes in bootstrap.servers and T is the default tcp timeout.
>> Is
>> > there any concerns to have a long reconnect.backoff.ms like that?  Any
>> > better solutions?
>> >
>> > Cheers, Steve
>> >
>>
>

Re: Question regarding to reconnect.backoff.ms

Posted by Steve Tian <st...@gmail.com>.
Thanks, Rahul!  In my environment I need to have reconnect.backoff.ms
longer than OS default tcp timeout so that NetworkClient can give second
node a try.

I believe this is related to
https://issues.apache.org/jira/browse/KAFKA-2459 .

Cheers, Steve

On Tue, Sep 1, 2015, 5:24 PM Rahul Jain <ra...@gmail.com> wrote:

> We did notice something similar. When a broker node (out of 3) went down,
> metadata calls continued to go to the failed node and producer kept
> failing. We were able to make it work by increasing the
> reconnect.backoff.ms
> to 1 second.
>
> Something similar was discussed earlier -
>
> http://qnalist.com/questions/6002514/new-producer-metadata-update-problem-on-2-node-cluster
>
>
>
> On Mon, Aug 31, 2015 at 11:00 PM, Steve Tian <st...@gmail.com>
> wrote:
>
> > Hi everyone,
> >
> > Is there any concerns to have a long reconnect.backoff.ms for new java
> > Kafka producer (0.8.2.0/0.8.2.1)?
> >
> > Assuming we have bootstrap.servers=host1:port1,host2:port2,host3:port3
> and
> > host1 is *down* in the very beginning. If a newly created Kafka producer
> > decide to choose host1 as first node to connect for metadata update, then
> > that producer will keep trying on host1 *only* as default tcp timeout is
> > surely longer than default value of reconnect.backoff.ms, which is 10
> ms.
> >
> > I am thinking to have reconnect.backoff.ms longer than N * T where N is
> > the
> > number of nodes in bootstrap.servers and T is the default tcp timeout.
> Is
> > there any concerns to have a long reconnect.backoff.ms like that?  Any
> > better solutions?
> >
> > Cheers, Steve
> >
>