You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jason Gustafson <ja...@confluent.io> on 2016/06/16 18:44:04 UTC

[VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Hi All,

I'd like to open the vote for KIP-62. This proposal attempts to address one
of the recurring usability problems that users of the new consumer have
faced with as little impact as possible. You can read the full details
here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
.

After some discussion on this list, I think we were in agreement that this
change addresses a major part of the problem and we've left the door open
for further improvements, such as adding a heartbeat() API or a separately
configured rebalance timeout. Thanks in advance to everyone who helped
review the proposal.

-Jason

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Jun Rao <ju...@confluent.io>.
Jason,

Thanks for the KIP. +1

Just one clarification. The KIP adds a rebalance timeout in the protocol,
but didn't say what value will be used. I guess we will use
max.poll.interval.ms?

Jun

On Thu, Jun 16, 2016 at 11:44 AM, Jason Gustafson <ja...@confluent.io>
wrote:

> Hi All,
>
> I'd like to open the vote for KIP-62. This proposal attempts to address one
> of the recurring usability problems that users of the new consumer have
> faced with as little impact as possible. You can read the full details
> here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> .
>
> After some discussion on this list, I think we were in agreement that this
> change addresses a major part of the problem and we've left the door open
> for further improvements, such as adding a heartbeat() API or a separately
> configured rebalance timeout. Thanks in advance to everyone who helped
> review the proposal.
>
> -Jason
>

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Ismael Juma <is...@juma.me.uk>.
Awesome, please update the KIP page Jason. :)

Ismael

On Tue, Jun 21, 2016 at 11:14 PM, Jason Gustafson <ja...@confluent.io>
wrote:

> Hi All,
>
> Looks like the vote has passed with +6 binding and +5 non-binding. Thanks
> everyone for help reviewing the proposal.
>
> -Jason
>
> On Mon, Jun 20, 2016 at 2:04 PM, Jay Kreps <ja...@confluent.io> wrote:
>
> > +1
> >
> > -Jay
> >
> > On Mon, Jun 20, 2016 at 1:39 PM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hi All, I've changed the default max.poll.interval.ms in the KIP to 5
> > > minutes. Unless there are any objections, perhaps we can skip the
> revote
> > > since this is a small change. In any case, I'll leave the vote open for
> > > another day.
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Mon, Jun 20, 2016 at 12:39 PM, Ismael Juma <is...@juma.me.uk>
> wrote:
> > >
> > > > On Mon, Jun 20, 2016 at 9:32 PM, Jay Kreps <ja...@confluent.io> wrote:
> > > >
> > > > > Also, checked exceptions? Really Ewen??? :-)
> > > > >
> > > >
> > > > Haha, yeah. I thought checked exceptions were universally disliked.
> > > People
> > > > who favour static typing tend to prefer Disjunction/Either and the
> rest
> > > > tend to prefer unchecked exceptions.
> > > >
> > > > Ismael
> > > >
> > >
> >
>

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Jason Gustafson <ja...@confluent.io>.
Thanks Guozhang. I missed Jun's question.

@Ismael Done.

On Tue, Jun 21, 2016 at 2:37 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Hello Jun,
>
> Just clarifying, it will be using the max.poll.interval.ms config, in the
> wiki we use the term "process timeout" for it which exposed in the consumer
> configs as "max.poll.interval.ms". I have updated the wiki to make it more
> clear.
>
> Guozhang
>
>
> On Tue, Jun 21, 2016 at 2:14 PM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hi All,
> >
> > Looks like the vote has passed with +6 binding and +5 non-binding. Thanks
> > everyone for help reviewing the proposal.
> >
> > -Jason
> >
> > On Mon, Jun 20, 2016 at 2:04 PM, Jay Kreps <ja...@confluent.io> wrote:
> >
> > > +1
> > >
> > > -Jay
> > >
> > > On Mon, Jun 20, 2016 at 1:39 PM, Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hi All, I've changed the default max.poll.interval.ms in the KIP to
> 5
> > > > minutes. Unless there are any objections, perhaps we can skip the
> > revote
> > > > since this is a small change. In any case, I'll leave the vote open
> for
> > > > another day.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Mon, Jun 20, 2016 at 12:39 PM, Ismael Juma <is...@juma.me.uk>
> > wrote:
> > > >
> > > > > On Mon, Jun 20, 2016 at 9:32 PM, Jay Kreps <ja...@confluent.io>
> wrote:
> > > > >
> > > > > > Also, checked exceptions? Really Ewen??? :-)
> > > > > >
> > > > >
> > > > > Haha, yeah. I thought checked exceptions were universally disliked.
> > > > People
> > > > > who favour static typing tend to prefer Disjunction/Either and the
> > rest
> > > > > tend to prefer unchecked exceptions.
> > > > >
> > > > > Ismael
> > > > >
> > > >
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Jun,

Just clarifying, it will be using the max.poll.interval.ms config, in the
wiki we use the term "process timeout" for it which exposed in the consumer
configs as "max.poll.interval.ms". I have updated the wiki to make it more
clear.

Guozhang


On Tue, Jun 21, 2016 at 2:14 PM, Jason Gustafson <ja...@confluent.io> wrote:

> Hi All,
>
> Looks like the vote has passed with +6 binding and +5 non-binding. Thanks
> everyone for help reviewing the proposal.
>
> -Jason
>
> On Mon, Jun 20, 2016 at 2:04 PM, Jay Kreps <ja...@confluent.io> wrote:
>
> > +1
> >
> > -Jay
> >
> > On Mon, Jun 20, 2016 at 1:39 PM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hi All, I've changed the default max.poll.interval.ms in the KIP to 5
> > > minutes. Unless there are any objections, perhaps we can skip the
> revote
> > > since this is a small change. In any case, I'll leave the vote open for
> > > another day.
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Mon, Jun 20, 2016 at 12:39 PM, Ismael Juma <is...@juma.me.uk>
> wrote:
> > >
> > > > On Mon, Jun 20, 2016 at 9:32 PM, Jay Kreps <ja...@confluent.io> wrote:
> > > >
> > > > > Also, checked exceptions? Really Ewen??? :-)
> > > > >
> > > >
> > > > Haha, yeah. I thought checked exceptions were universally disliked.
> > > People
> > > > who favour static typing tend to prefer Disjunction/Either and the
> rest
> > > > tend to prefer unchecked exceptions.
> > > >
> > > > Ismael
> > > >
> > >
> >
>



-- 
-- Guozhang

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Jason Gustafson <ja...@confluent.io>.
Hi All,

Looks like the vote has passed with +6 binding and +5 non-binding. Thanks
everyone for help reviewing the proposal.

-Jason

On Mon, Jun 20, 2016 at 2:04 PM, Jay Kreps <ja...@confluent.io> wrote:

> +1
>
> -Jay
>
> On Mon, Jun 20, 2016 at 1:39 PM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hi All, I've changed the default max.poll.interval.ms in the KIP to 5
> > minutes. Unless there are any objections, perhaps we can skip the revote
> > since this is a small change. In any case, I'll leave the vote open for
> > another day.
> >
> > Thanks,
> > Jason
> >
> > On Mon, Jun 20, 2016 at 12:39 PM, Ismael Juma <is...@juma.me.uk> wrote:
> >
> > > On Mon, Jun 20, 2016 at 9:32 PM, Jay Kreps <ja...@confluent.io> wrote:
> > >
> > > > Also, checked exceptions? Really Ewen??? :-)
> > > >
> > >
> > > Haha, yeah. I thought checked exceptions were universally disliked.
> > People
> > > who favour static typing tend to prefer Disjunction/Either and the rest
> > > tend to prefer unchecked exceptions.
> > >
> > > Ismael
> > >
> >
>

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Jay Kreps <ja...@confluent.io>.
+1

-Jay

On Mon, Jun 20, 2016 at 1:39 PM, Jason Gustafson <ja...@confluent.io> wrote:

> Hi All, I've changed the default max.poll.interval.ms in the KIP to 5
> minutes. Unless there are any objections, perhaps we can skip the revote
> since this is a small change. In any case, I'll leave the vote open for
> another day.
>
> Thanks,
> Jason
>
> On Mon, Jun 20, 2016 at 12:39 PM, Ismael Juma <is...@juma.me.uk> wrote:
>
> > On Mon, Jun 20, 2016 at 9:32 PM, Jay Kreps <ja...@confluent.io> wrote:
> >
> > > Also, checked exceptions? Really Ewen??? :-)
> > >
> >
> > Haha, yeah. I thought checked exceptions were universally disliked.
> People
> > who favour static typing tend to prefer Disjunction/Either and the rest
> > tend to prefer unchecked exceptions.
> >
> > Ismael
> >
>

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Jason Gustafson <ja...@confluent.io>.
Hi All, I've changed the default max.poll.interval.ms in the KIP to 5
minutes. Unless there are any objections, perhaps we can skip the revote
since this is a small change. In any case, I'll leave the vote open for
another day.

Thanks,
Jason

On Mon, Jun 20, 2016 at 12:39 PM, Ismael Juma <is...@juma.me.uk> wrote:

> On Mon, Jun 20, 2016 at 9:32 PM, Jay Kreps <ja...@confluent.io> wrote:
>
> > Also, checked exceptions? Really Ewen??? :-)
> >
>
> Haha, yeah. I thought checked exceptions were universally disliked. People
> who favour static typing tend to prefer Disjunction/Either and the rest
> tend to prefer unchecked exceptions.
>
> Ismael
>

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Ismael Juma <is...@juma.me.uk>.
On Mon, Jun 20, 2016 at 9:32 PM, Jay Kreps <ja...@confluent.io> wrote:

> Also, checked exceptions? Really Ewen??? :-)
>

Haha, yeah. I thought checked exceptions were universally disliked. People
who favour static typing tend to prefer Disjunction/Either and the rest
tend to prefer unchecked exceptions.

Ismael

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Jay Kreps <ja...@confluent.io>.
Jason, yeah I think raising to 3 minutes (or better yet 5 mins) would be
better since fewer people would hit it. I do think guessing is going to be
kind of annoying here since you will eventually hit the limit in prod and
curse us since it wasn't a limit you intended to set, but at least setting
a high default should make this unlikely.

Ewen, I agree people should ideally think about and set this limit. I guess
my feeling is that for people who do think about it, the default can be
infinite with no damage, they will set something reasonable and know what
to do if they hit it. For those who don't I just think our guessing a value
for all applications and not telling you is kind of annoying since it will
go off, probably in production, and you will be confused as to what is
happening.

One thing maybe we can agree is that it would help if the error message
tells you what config controls this so you have some hope of figuring out
what happened. Typically we have good intentions with these things but all
a reasonably informed person who didn't write the code can actually
conclude is something is broken, probably because Kafka is buggy (this was
the problem with the consumer originally--most of the confused people
weren't running up against the impossibility of configuring this value,
they just couldn't figure out what the heck was going on).

Also, checked exceptions? Really Ewen??? :-)

-Jay



On Mon, Jun 20, 2016 at 11:58 AM, Jason Gustafson <ja...@confluent.io>
wrote:

> Hey Jay,
>
> Thanks for the comments. I'd be sorely tempted to default to an infinite
> value for max.poll.interval.ms if we offered a separate rebalance timeout,
> but it seems a little dangerous as long as we use the same setting for
> both. In the worst case, a live-locked process could indefinitely keep the
> entire group from rebalancing. I agree with Ewen as well that users really
> ought to be thinking about liveness. But I think it would be reasonable to
> increase the default I've suggested (one minute) up to 2-3 minutes. That
> combined with a sensible default for max.poll.records (which is also part
> of this KIP) should provide reasonable out-of-the-box behavior for most
> users.
>
> Thanks,
> Jason
>
> On Sun, Jun 19, 2016 at 7:28 PM, Ewen Cheslack-Postava <ew...@confluent.io>
> wrote:
>
> > +1 on the KIP.
> >
> > On Sat, Jun 18, 2016 at 11:52 AM, Ismael Juma <is...@juma.me.uk> wrote:
> >
> > > If we do that, shouldn't `max.poll.records` remain with the current
> > default
> > > of `Integer.MAX_VALUE`?
> > >
> > > On Sat, Jun 18, 2016 at 5:18 PM, Jay Kreps <ja...@confluent.io> wrote:
> > >
> > > > +1
> > > >
> > > > One small but important thing I think we should consider changing: I
> > > think
> > > > we should consider setting the default for max.poll.interval to
> > infinite.
> > > > Previously our definition of alive was "polls within session
> timeout".
> > > Now
> > > > our definition of alive is "pings from b/g thread w/in session
> > timeout".
> > > > The reality is that anything we set here as a default is going to be
> > too
> > > > low for some people and those people will be confused. Previously
> that
> > > was
> > > > okay because the definition of liveness required you to understand
> and
> > > > think about this setting, so getting an error meant you needed to
> think
> > > > harder. But now this is really an optional setting for people who
> care
> > to
> > > > enforce some bound, so introducing a possible failure/instability
> mode
> > by
> > > > guess that bound seems wrong. Instead I think users that know and
> care
> > > > about this should set a thoughtful max.poll.interval, but we should
> not
> > > try
> > > > to guess for those who don't.
> > > >
> > > > This seems minor but I think we've found that defaults matter more
> than
> > > > configs so it's worth being thoughtful.
> > >
> >
> > This is very, very not minor. This is a really important default and I
> > strongly disagree that setting it to infinite is a good idea. Liveness
> > isn't something you can ignore. People like to ignore it because it makes
> > writing code easier, but that just means they write broken code. We can't
> > avoid giving them the rope to hang themselves (they can always override
> the
> > setting to a very large value), but we shouldn't encourage them to do it.
> >
> > Looking at various connectors (beyond just Kafka itself, as I was looking
> > at other frameworks), there was quite a bit of code structured roughly
> as:
> >
> > while(true) {
> >   try {
> >     conn = open(url);
> >     records = consumer.poll();
> >     for (ConsumerRecord record : records) {
> >       sendRecord(conn, record);
> >     }
> >   } catch (ConnectionException e) {
> >     //ignore and retry
> >   }
> > }
> >
> > In other words, code which will never make progress under a configuration
> > or deployment error. Handling errors by ignoring them isn't rare, which
> > means you'll probably get no real indication of liveness unless you make
> it
> > explicit (i.e. you *must* call something periodically). Of course, users
> > can always continue to screw themselves over by also ignoring errors *we*
> > produce and still keeping their app alive, but at that point we've done
> all
> > we can and at least we'll have attempted to shift work to another
> instance.
> >
> > I think setting a *larger* default timeout could make sense -- there's no
> > reasonable way to set a default since the message size, number of
> messages,
> > and message processing time will all vary widely by application. But
> really
> > the important distinction is (reasonable) finite timeout vs infinite. We
> > shouldn't default to something that never lets you figure out that
> > something is wrong. (If we could come up with a finite maximum value, I
> > would absolutely want to add that as a strict maximum, but there's no
> such
> > value.) If someone exceeds what we choose as a default maximum, it is
> > *really* in their interest to understand why they need such a large
> timeout
> > and whether there are better solutions to their problem.
> >
> > I'm sure it comes across as condescending, but we actually do know better
> > than many application developers. There are implications of just dropping
> > these timeouts that don't end well for the app. Ignoring those error
> cases
> > and liveness issues works fine 99% of the time, but it's important if you
> > really care about resiliency of your app and when things break they'll
> ask
> > why we set a default value that leads to such bad results. I staunchly
> > believe that it's better to explain why there is complexity and how to
> deal
> > with it than to try to hide it when it can't really be hidden.
> >
> > -Ewen
> >
> > P.S. I would invoke checked exceptions as another case where framework
> devs
> > try to encourage app developers to avoid hanging themselves yet app devs
> > are given enough rope and end up hanging themselves, but the Kafka
> codebase
> > eschews checked exceptions, so....
> >
> >
> >
> > > >
> > > > -Jay
> > > >
> > > > On Thu, Jun 16, 2016 at 11:44 AM, Jason Gustafson <
> jason@confluent.io>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I'd like to open the vote for KIP-62. This proposal attempts to
> > address
> > > > one
> > > > > of the recurring usability problems that users of the new consumer
> > have
> > > > > faced with as little impact as possible. You can read the full
> > details
> > > > > here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> > > > > .
> > > > >
> > > > > After some discussion on this list, I think we were in agreement
> that
> > > > this
> > > > > change addresses a major part of the problem and we've left the
> door
> > > open
> > > > > for further improvements, such as adding a heartbeat() API or a
> > > > separately
> > > > > configured rebalance timeout. Thanks in advance to everyone who
> > helped
> > > > > review the proposal.
> > > > >
> > > > > -Jason
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Ewen
> >
>

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Jason Gustafson <ja...@confluent.io>.
Hey Jay,

Thanks for the comments. I'd be sorely tempted to default to an infinite
value for max.poll.interval.ms if we offered a separate rebalance timeout,
but it seems a little dangerous as long as we use the same setting for
both. In the worst case, a live-locked process could indefinitely keep the
entire group from rebalancing. I agree with Ewen as well that users really
ought to be thinking about liveness. But I think it would be reasonable to
increase the default I've suggested (one minute) up to 2-3 minutes. That
combined with a sensible default for max.poll.records (which is also part
of this KIP) should provide reasonable out-of-the-box behavior for most
users.

Thanks,
Jason

On Sun, Jun 19, 2016 at 7:28 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> +1 on the KIP.
>
> On Sat, Jun 18, 2016 at 11:52 AM, Ismael Juma <is...@juma.me.uk> wrote:
>
> > If we do that, shouldn't `max.poll.records` remain with the current
> default
> > of `Integer.MAX_VALUE`?
> >
> > On Sat, Jun 18, 2016 at 5:18 PM, Jay Kreps <ja...@confluent.io> wrote:
> >
> > > +1
> > >
> > > One small but important thing I think we should consider changing: I
> > think
> > > we should consider setting the default for max.poll.interval to
> infinite.
> > > Previously our definition of alive was "polls within session timeout".
> > Now
> > > our definition of alive is "pings from b/g thread w/in session
> timeout".
> > > The reality is that anything we set here as a default is going to be
> too
> > > low for some people and those people will be confused. Previously that
> > was
> > > okay because the definition of liveness required you to understand and
> > > think about this setting, so getting an error meant you needed to think
> > > harder. But now this is really an optional setting for people who care
> to
> > > enforce some bound, so introducing a possible failure/instability mode
> by
> > > guess that bound seems wrong. Instead I think users that know and care
> > > about this should set a thoughtful max.poll.interval, but we should not
> > try
> > > to guess for those who don't.
> > >
> > > This seems minor but I think we've found that defaults matter more than
> > > configs so it's worth being thoughtful.
> >
>
> This is very, very not minor. This is a really important default and I
> strongly disagree that setting it to infinite is a good idea. Liveness
> isn't something you can ignore. People like to ignore it because it makes
> writing code easier, but that just means they write broken code. We can't
> avoid giving them the rope to hang themselves (they can always override the
> setting to a very large value), but we shouldn't encourage them to do it.
>
> Looking at various connectors (beyond just Kafka itself, as I was looking
> at other frameworks), there was quite a bit of code structured roughly as:
>
> while(true) {
>   try {
>     conn = open(url);
>     records = consumer.poll();
>     for (ConsumerRecord record : records) {
>       sendRecord(conn, record);
>     }
>   } catch (ConnectionException e) {
>     //ignore and retry
>   }
> }
>
> In other words, code which will never make progress under a configuration
> or deployment error. Handling errors by ignoring them isn't rare, which
> means you'll probably get no real indication of liveness unless you make it
> explicit (i.e. you *must* call something periodically). Of course, users
> can always continue to screw themselves over by also ignoring errors *we*
> produce and still keeping their app alive, but at that point we've done all
> we can and at least we'll have attempted to shift work to another instance.
>
> I think setting a *larger* default timeout could make sense -- there's no
> reasonable way to set a default since the message size, number of messages,
> and message processing time will all vary widely by application. But really
> the important distinction is (reasonable) finite timeout vs infinite. We
> shouldn't default to something that never lets you figure out that
> something is wrong. (If we could come up with a finite maximum value, I
> would absolutely want to add that as a strict maximum, but there's no such
> value.) If someone exceeds what we choose as a default maximum, it is
> *really* in their interest to understand why they need such a large timeout
> and whether there are better solutions to their problem.
>
> I'm sure it comes across as condescending, but we actually do know better
> than many application developers. There are implications of just dropping
> these timeouts that don't end well for the app. Ignoring those error cases
> and liveness issues works fine 99% of the time, but it's important if you
> really care about resiliency of your app and when things break they'll ask
> why we set a default value that leads to such bad results. I staunchly
> believe that it's better to explain why there is complexity and how to deal
> with it than to try to hide it when it can't really be hidden.
>
> -Ewen
>
> P.S. I would invoke checked exceptions as another case where framework devs
> try to encourage app developers to avoid hanging themselves yet app devs
> are given enough rope and end up hanging themselves, but the Kafka codebase
> eschews checked exceptions, so....
>
>
>
> > >
> > > -Jay
> > >
> > > On Thu, Jun 16, 2016 at 11:44 AM, Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'd like to open the vote for KIP-62. This proposal attempts to
> address
> > > one
> > > > of the recurring usability problems that users of the new consumer
> have
> > > > faced with as little impact as possible. You can read the full
> details
> > > > here:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> > > > .
> > > >
> > > > After some discussion on this list, I think we were in agreement that
> > > this
> > > > change addresses a major part of the problem and we've left the door
> > open
> > > > for further improvements, such as adding a heartbeat() API or a
> > > separately
> > > > configured rebalance timeout. Thanks in advance to everyone who
> helped
> > > > review the proposal.
> > > >
> > > > -Jason
> > > >
> > >
> >
>
>
>
> --
> Thanks,
> Ewen
>

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
+1 on the KIP.

On Sat, Jun 18, 2016 at 11:52 AM, Ismael Juma <is...@juma.me.uk> wrote:

> If we do that, shouldn't `max.poll.records` remain with the current default
> of `Integer.MAX_VALUE`?
>
> On Sat, Jun 18, 2016 at 5:18 PM, Jay Kreps <ja...@confluent.io> wrote:
>
> > +1
> >
> > One small but important thing I think we should consider changing: I
> think
> > we should consider setting the default for max.poll.interval to infinite.
> > Previously our definition of alive was "polls within session timeout".
> Now
> > our definition of alive is "pings from b/g thread w/in session timeout".
> > The reality is that anything we set here as a default is going to be too
> > low for some people and those people will be confused. Previously that
> was
> > okay because the definition of liveness required you to understand and
> > think about this setting, so getting an error meant you needed to think
> > harder. But now this is really an optional setting for people who care to
> > enforce some bound, so introducing a possible failure/instability mode by
> > guess that bound seems wrong. Instead I think users that know and care
> > about this should set a thoughtful max.poll.interval, but we should not
> try
> > to guess for those who don't.
> >
> > This seems minor but I think we've found that defaults matter more than
> > configs so it's worth being thoughtful.
>

This is very, very not minor. This is a really important default and I
strongly disagree that setting it to infinite is a good idea. Liveness
isn't something you can ignore. People like to ignore it because it makes
writing code easier, but that just means they write broken code. We can't
avoid giving them the rope to hang themselves (they can always override the
setting to a very large value), but we shouldn't encourage them to do it.

Looking at various connectors (beyond just Kafka itself, as I was looking
at other frameworks), there was quite a bit of code structured roughly as:

while(true) {
  try {
    conn = open(url);
    records = consumer.poll();
    for (ConsumerRecord record : records) {
      sendRecord(conn, record);
    }
  } catch (ConnectionException e) {
    //ignore and retry
  }
}

In other words, code which will never make progress under a configuration
or deployment error. Handling errors by ignoring them isn't rare, which
means you'll probably get no real indication of liveness unless you make it
explicit (i.e. you *must* call something periodically). Of course, users
can always continue to screw themselves over by also ignoring errors *we*
produce and still keeping their app alive, but at that point we've done all
we can and at least we'll have attempted to shift work to another instance.

I think setting a *larger* default timeout could make sense -- there's no
reasonable way to set a default since the message size, number of messages,
and message processing time will all vary widely by application. But really
the important distinction is (reasonable) finite timeout vs infinite. We
shouldn't default to something that never lets you figure out that
something is wrong. (If we could come up with a finite maximum value, I
would absolutely want to add that as a strict maximum, but there's no such
value.) If someone exceeds what we choose as a default maximum, it is
*really* in their interest to understand why they need such a large timeout
and whether there are better solutions to their problem.

I'm sure it comes across as condescending, but we actually do know better
than many application developers. There are implications of just dropping
these timeouts that don't end well for the app. Ignoring those error cases
and liveness issues works fine 99% of the time, but it's important if you
really care about resiliency of your app and when things break they'll ask
why we set a default value that leads to such bad results. I staunchly
believe that it's better to explain why there is complexity and how to deal
with it than to try to hide it when it can't really be hidden.

-Ewen

P.S. I would invoke checked exceptions as another case where framework devs
try to encourage app developers to avoid hanging themselves yet app devs
are given enough rope and end up hanging themselves, but the Kafka codebase
eschews checked exceptions, so....



> >
> > -Jay
> >
> > On Thu, Jun 16, 2016 at 11:44 AM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hi All,
> > >
> > > I'd like to open the vote for KIP-62. This proposal attempts to address
> > one
> > > of the recurring usability problems that users of the new consumer have
> > > faced with as little impact as possible. You can read the full details
> > > here:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> > > .
> > >
> > > After some discussion on this list, I think we were in agreement that
> > this
> > > change addresses a major part of the problem and we've left the door
> open
> > > for further improvements, such as adding a heartbeat() API or a
> > separately
> > > configured rebalance timeout. Thanks in advance to everyone who helped
> > > review the proposal.
> > >
> > > -Jason
> > >
> >
>



-- 
Thanks,
Ewen

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Ismael Juma <is...@juma.me.uk>.
If we do that, shouldn't `max.poll.records` remain with the current default
of `Integer.MAX_VALUE`?

On Sat, Jun 18, 2016 at 5:18 PM, Jay Kreps <ja...@confluent.io> wrote:

> +1
>
> One small but important thing I think we should consider changing: I think
> we should consider setting the default for max.poll.interval to infinite.
> Previously our definition of alive was "polls within session timeout". Now
> our definition of alive is "pings from b/g thread w/in session timeout".
> The reality is that anything we set here as a default is going to be too
> low for some people and those people will be confused. Previously that was
> okay because the definition of liveness required you to understand and
> think about this setting, so getting an error meant you needed to think
> harder. But now this is really an optional setting for people who care to
> enforce some bound, so introducing a possible failure/instability mode by
> guess that bound seems wrong. Instead I think users that know and care
> about this should set a thoughtful max.poll.interval, but we should not try
> to guess for those who don't.
>
> This seems minor but I think we've found that defaults matter more than
> configs so it's worth being thoughtful.
>
> -Jay
>
> On Thu, Jun 16, 2016 at 11:44 AM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hi All,
> >
> > I'd like to open the vote for KIP-62. This proposal attempts to address
> one
> > of the recurring usability problems that users of the new consumer have
> > faced with as little impact as possible. You can read the full details
> > here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> > .
> >
> > After some discussion on this list, I think we were in agreement that
> this
> > change addresses a major part of the problem and we've left the door open
> > for further improvements, such as adding a heartbeat() API or a
> separately
> > configured rebalance timeout. Thanks in advance to everyone who helped
> > review the proposal.
> >
> > -Jason
> >
>

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Jay Kreps <ja...@confluent.io>.
+1

One small but important thing I think we should consider changing: I think
we should consider setting the default for max.poll.interval to infinite.
Previously our definition of alive was "polls within session timeout". Now
our definition of alive is "pings from b/g thread w/in session timeout".
The reality is that anything we set here as a default is going to be too
low for some people and those people will be confused. Previously that was
okay because the definition of liveness required you to understand and
think about this setting, so getting an error meant you needed to think
harder. But now this is really an optional setting for people who care to
enforce some bound, so introducing a possible failure/instability mode by
guess that bound seems wrong. Instead I think users that know and care
about this should set a thoughtful max.poll.interval, but we should not try
to guess for those who don't.

This seems minor but I think we've found that defaults matter more than
configs so it's worth being thoughtful.

-Jay

On Thu, Jun 16, 2016 at 11:44 AM, Jason Gustafson <ja...@confluent.io>
wrote:

> Hi All,
>
> I'd like to open the vote for KIP-62. This proposal attempts to address one
> of the recurring usability problems that users of the new consumer have
> faced with as little impact as possible. You can read the full details
> here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> .
>
> After some discussion on this list, I think we were in agreement that this
> change addresses a major part of the problem and we've left the door open
> for further improvements, such as adding a heartbeat() API or a separately
> configured rebalance timeout. Thanks in advance to everyone who helped
> review the proposal.
>
> -Jason
>

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Manikumar Reddy <ma...@gmail.com>.
+1 (non-binding)

On Fri, Jun 17, 2016 at 3:37 PM, Rajini Sivaram <
rajinisivaram@googlemail.com> wrote:

> +1 (non-binding)
>
> On Fri, Jun 17, 2016 at 4:45 AM, Grant Henke <gh...@cloudera.com> wrote:
>
> > +1
> >
> > On Thu, Jun 16, 2016 at 8:50 PM, tao xiao <xi...@gmail.com> wrote:
> >
> > > +1
> > >
> > > On Fri, 17 Jun 2016 at 09:03 Harsha <ka...@harsha.io> wrote:
> > >
> > > > +1 (binding)
> > > > Thanks,
> > > > Harsha
> > > >
> > > > On Thu, Jun 16, 2016, at 05:46 PM, Henry Cai wrote:
> > > > > +1
> > > > >
> > > > > On Thu, Jun 16, 2016 at 3:46 PM, Ismael Juma <is...@juma.me.uk>
> > > wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > On Fri, Jun 17, 2016 at 12:44 AM, Guozhang Wang <
> > wangguoz@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > +1.
> > > > > > >
> > > > > > > On Thu, Jun 16, 2016 at 11:44 AM, Jason Gustafson <
> > > > jason@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > I'd like to open the vote for KIP-62. This proposal attempts
> to
> > > > address
> > > > > > > one
> > > > > > > > of the recurring usability problems that users of the new
> > > consumer
> > > > have
> > > > > > > > faced with as little impact as possible. You can read the
> full
> > > > details
> > > > > > > > here:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> > > > > > > > .
> > > > > > > >
> > > > > > > > After some discussion on this list, I think we were in
> > agreement
> > > > that
> > > > > > > this
> > > > > > > > change addresses a major part of the problem and we've left
> the
> > > > door
> > > > > > open
> > > > > > > > for further improvements, such as adding a heartbeat() API
> or a
> > > > > > > separately
> > > > > > > > configured rebalance timeout. Thanks in advance to everyone
> who
> > > > helped
> > > > > > > > review the proposal.
> > > > > > > >
> > > > > > > > -Jason
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Grant Henke
> > Software Engineer | Cloudera
> > grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
> >
>
>
>
> --
> Regards,
>
> Rajini
>

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Rajini Sivaram <ra...@googlemail.com>.
+1 (non-binding)

On Fri, Jun 17, 2016 at 4:45 AM, Grant Henke <gh...@cloudera.com> wrote:

> +1
>
> On Thu, Jun 16, 2016 at 8:50 PM, tao xiao <xi...@gmail.com> wrote:
>
> > +1
> >
> > On Fri, 17 Jun 2016 at 09:03 Harsha <ka...@harsha.io> wrote:
> >
> > > +1 (binding)
> > > Thanks,
> > > Harsha
> > >
> > > On Thu, Jun 16, 2016, at 05:46 PM, Henry Cai wrote:
> > > > +1
> > > >
> > > > On Thu, Jun 16, 2016 at 3:46 PM, Ismael Juma <is...@juma.me.uk>
> > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Fri, Jun 17, 2016 at 12:44 AM, Guozhang Wang <
> wangguoz@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1.
> > > > > >
> > > > > > On Thu, Jun 16, 2016 at 11:44 AM, Jason Gustafson <
> > > jason@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > I'd like to open the vote for KIP-62. This proposal attempts to
> > > address
> > > > > > one
> > > > > > > of the recurring usability problems that users of the new
> > consumer
> > > have
> > > > > > > faced with as little impact as possible. You can read the full
> > > details
> > > > > > > here:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> > > > > > > .
> > > > > > >
> > > > > > > After some discussion on this list, I think we were in
> agreement
> > > that
> > > > > > this
> > > > > > > change addresses a major part of the problem and we've left the
> > > door
> > > > > open
> > > > > > > for further improvements, such as adding a heartbeat() API or a
> > > > > > separately
> > > > > > > configured rebalance timeout. Thanks in advance to everyone who
> > > helped
> > > > > > > review the proposal.
> > > > > > >
> > > > > > > -Jason
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > > >
> > > > >
> > >
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>



-- 
Regards,

Rajini

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Grant Henke <gh...@cloudera.com>.
+1

On Thu, Jun 16, 2016 at 8:50 PM, tao xiao <xi...@gmail.com> wrote:

> +1
>
> On Fri, 17 Jun 2016 at 09:03 Harsha <ka...@harsha.io> wrote:
>
> > +1 (binding)
> > Thanks,
> > Harsha
> >
> > On Thu, Jun 16, 2016, at 05:46 PM, Henry Cai wrote:
> > > +1
> > >
> > > On Thu, Jun 16, 2016 at 3:46 PM, Ismael Juma <is...@juma.me.uk>
> wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > On Fri, Jun 17, 2016 at 12:44 AM, Guozhang Wang <wa...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1.
> > > > >
> > > > > On Thu, Jun 16, 2016 at 11:44 AM, Jason Gustafson <
> > jason@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I'd like to open the vote for KIP-62. This proposal attempts to
> > address
> > > > > one
> > > > > > of the recurring usability problems that users of the new
> consumer
> > have
> > > > > > faced with as little impact as possible. You can read the full
> > details
> > > > > > here:
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> > > > > > .
> > > > > >
> > > > > > After some discussion on this list, I think we were in agreement
> > that
> > > > > this
> > > > > > change addresses a major part of the problem and we've left the
> > door
> > > > open
> > > > > > for further improvements, such as adding a heartbeat() API or a
> > > > > separately
> > > > > > configured rebalance timeout. Thanks in advance to everyone who
> > helped
> > > > > > review the proposal.
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by tao xiao <xi...@gmail.com>.
+1

On Fri, 17 Jun 2016 at 09:03 Harsha <ka...@harsha.io> wrote:

> +1 (binding)
> Thanks,
> Harsha
>
> On Thu, Jun 16, 2016, at 05:46 PM, Henry Cai wrote:
> > +1
> >
> > On Thu, Jun 16, 2016 at 3:46 PM, Ismael Juma <is...@juma.me.uk> wrote:
> >
> > > +1 (binding)
> > >
> > > On Fri, Jun 17, 2016 at 12:44 AM, Guozhang Wang <wa...@gmail.com>
> > > wrote:
> > >
> > > > +1.
> > > >
> > > > On Thu, Jun 16, 2016 at 11:44 AM, Jason Gustafson <
> jason@confluent.io>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I'd like to open the vote for KIP-62. This proposal attempts to
> address
> > > > one
> > > > > of the recurring usability problems that users of the new consumer
> have
> > > > > faced with as little impact as possible. You can read the full
> details
> > > > > here:
> > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> > > > > .
> > > > >
> > > > > After some discussion on this list, I think we were in agreement
> that
> > > > this
> > > > > change addresses a major part of the problem and we've left the
> door
> > > open
> > > > > for further improvements, such as adding a heartbeat() API or a
> > > > separately
> > > > > configured rebalance timeout. Thanks in advance to everyone who
> helped
> > > > > review the proposal.
> > > > >
> > > > > -Jason
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
>

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Harsha <ka...@harsha.io>.
+1 (binding)
Thanks,
Harsha

On Thu, Jun 16, 2016, at 05:46 PM, Henry Cai wrote:
> +1
> 
> On Thu, Jun 16, 2016 at 3:46 PM, Ismael Juma <is...@juma.me.uk> wrote:
> 
> > +1 (binding)
> >
> > On Fri, Jun 17, 2016 at 12:44 AM, Guozhang Wang <wa...@gmail.com>
> > wrote:
> >
> > > +1.
> > >
> > > On Thu, Jun 16, 2016 at 11:44 AM, Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'd like to open the vote for KIP-62. This proposal attempts to address
> > > one
> > > > of the recurring usability problems that users of the new consumer have
> > > > faced with as little impact as possible. You can read the full details
> > > > here:
> > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> > > > .
> > > >
> > > > After some discussion on this list, I think we were in agreement that
> > > this
> > > > change addresses a major part of the problem and we've left the door
> > open
> > > > for further improvements, such as adding a heartbeat() API or a
> > > separately
> > > > configured rebalance timeout. Thanks in advance to everyone who helped
> > > > review the proposal.
> > > >
> > > > -Jason
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Henry Cai <hc...@pinterest.com.INVALID>.
+1

On Thu, Jun 16, 2016 at 3:46 PM, Ismael Juma <is...@juma.me.uk> wrote:

> +1 (binding)
>
> On Fri, Jun 17, 2016 at 12:44 AM, Guozhang Wang <wa...@gmail.com>
> wrote:
>
> > +1.
> >
> > On Thu, Jun 16, 2016 at 11:44 AM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hi All,
> > >
> > > I'd like to open the vote for KIP-62. This proposal attempts to address
> > one
> > > of the recurring usability problems that users of the new consumer have
> > > faced with as little impact as possible. You can read the full details
> > > here:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> > > .
> > >
> > > After some discussion on this list, I think we were in agreement that
> > this
> > > change addresses a major part of the problem and we've left the door
> open
> > > for further improvements, such as adding a heartbeat() API or a
> > separately
> > > configured rebalance timeout. Thanks in advance to everyone who helped
> > > review the proposal.
> > >
> > > -Jason
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Ismael Juma <is...@juma.me.uk>.
+1 (binding)

On Fri, Jun 17, 2016 at 12:44 AM, Guozhang Wang <wa...@gmail.com> wrote:

> +1.
>
> On Thu, Jun 16, 2016 at 11:44 AM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hi All,
> >
> > I'd like to open the vote for KIP-62. This proposal attempts to address
> one
> > of the recurring usability problems that users of the new consumer have
> > faced with as little impact as possible. You can read the full details
> > here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> > .
> >
> > After some discussion on this list, I think we were in agreement that
> this
> > change addresses a major part of the problem and we've left the door open
> > for further improvements, such as adding a heartbeat() API or a
> separately
> > configured rebalance timeout. Thanks in advance to everyone who helped
> > review the proposal.
> >
> > -Jason
> >
>
>
>
> --
> -- Guozhang
>

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

Posted by Guozhang Wang <wa...@gmail.com>.
+1.

On Thu, Jun 16, 2016 at 11:44 AM, Jason Gustafson <ja...@confluent.io>
wrote:

> Hi All,
>
> I'd like to open the vote for KIP-62. This proposal attempts to address one
> of the recurring usability problems that users of the new consumer have
> faced with as little impact as possible. You can read the full details
> here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> .
>
> After some discussion on this list, I think we were in agreement that this
> change addresses a major part of the problem and we've left the door open
> for further improvements, such as adding a heartbeat() API or a separately
> configured rebalance timeout. Thanks in advance to everyone who helped
> review the proposal.
>
> -Jason
>



-- 
-- Guozhang