You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by vinay sharma <vi...@gmail.com> on 2016/04/25 22:52:18 UTC

No Heartbeat request on commit

Hello,

I am using client API 0.9.0.1 and facing an issue. As per my logs it seems
that on each commitSync(Offsets) a heartbeat request is sent but after a
metada refresh request till next poll(), commits do not send any hearbeat
request.

KafkaConsumers i create sometimes get session time out due to no hearbeat
specially during longer processing times. I call CommitSync(offsets) after
regular intervals to keep session alive when processing takes longer than
usual. Every thing works fine if commit intervals are very small or if i
commit after each record but if i commit lets say every 12 seconds and 30
seconds is session time then i can see consumer getting timed out sometimes.

Any help or pointers will be much appreciated. Thanks in advance.

Regards,
Vinay sharma

Re: No Heartbeat request on commit

Posted by vinay sharma <vi...@gmail.com>.

Here you go. Please find attached SimpleKafkaConsumer.zip.

On Thu, Apr 28, 2016 at 5:55 PM, vinay sharma <vi...@gmail.com>
wrote:

> I sent it as attachment. I did not zip it and may have been rejected by
> mail server. Will send again shortly.
> On Apr 28, 2016 5:11 PM, "Jason Gustafson" <ja...@confluent.io> wrote:
>
>> Hey Vinay,
>>
>> Did you forget to attach the simple class?
>>
>> -Jason
>>
>> On Thu, Apr 28, 2016 at 1:14 PM, vinay sharma <vi...@gmail.com>
>> wrote:
>>
>> > I was also wondering that if commitSync acts as heartbeat then why do we
>> > still trigger heartbeat request on commit? why not just reset its time
>> on
>> > successful commitSync? or am i wrong and we do this already?
>> >
>> > Regards,
>> > Vinay
>> >
>> > On Thu, Apr 28, 2016 at 3:38 PM, vinay sharma <vinsharma.tech@gmail.com
>> >
>> > wrote:
>> >
>> > > Hi Jason,
>> > >
>> > > Attached is a simple class with a main method. I used this for
>> > reproducing
>> > > issue and generate logs that i attached earlier. This class has code
>> > > snippets of poller relevant to the issue.
>> > >
>> > > Regards,
>> > > Vinay Sharma
>> > >
>> > > On Thu, Apr 28, 2016 at 3:30 PM, Jason Gustafson <ja...@confluent.io>
>> > > wrote:
>> > >
>> > >> Hey Vinay,
>> > >>
>> > >> Thanks, that's really helpful. It does seem like there might be a
>> > problem
>> > >> with the heartbeat trigger logic. I'll see if I can reproduce what
>> > you're
>> > >> seeing locally. Might be helpful if you share a snippet of your poll
>> > loop.
>> > >>
>> > >> Thanks,
>> > >> Jason
>> > >>
>> > >> On Thu, Apr 28, 2016 at 11:55 AM, vinay sharma <
>> > vinsharma.tech@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > Hi Jason,
>> > >> >
>> > >> > i reverted back to KAFKA-3149. Producer still had issues related to
>> > >> schema
>> > >> > but my consumer worked.
>> > >> >
>> > >> > Now consumer worked as expected. Although i did not encountered an
>> > error
>> > >> > and generation was not marked dead by coordinator but i still see
>> that
>> > >> > successful heartbeat response are not logged as expected.
>> > >> > My observation is following:-
>> > >> > 1) Meta refresh also triggers heartbeat request. I say this because
>> > >> > sometimes i see 2 heartbeat responses logged just a few
>> milliseconds
>> > >> away
>> > >> > where meta refresh and proactive commit happened almost
>> > simultaneously.
>> > >> > 2) I still see that some commitSync requests do not have a
>> heartbeat
>> > >> > logged before or after commit. Although next proactive commit
>> happened
>> > >> just
>> > >> > in time and this time heartbeat request was successful hence saved
>> > >> session.
>> > >> > In attached log you can see that poll was done at 14:17:41, a
>> commit
>> > >> > happened at 14:17:56 and another commit happened at 14:18:14. The
>> only
>> > >> > heart beat response logged during this time is at 14:18:14 which
>> is 29
>> > >> > seconds after poll where as a commit was performed 15 seconds after
>> > >> poll.
>> > >> > Heartbeat interval was 3000.
>> > >> > 3) There are long pauses in heartbeat responses in logs which
>> should
>> > >> cause
>> > >> > session to timeout but its not happening. This implies that commits
>> > >> trigger
>> > >> > a heartbeat but they also act as heartbeat.
>> > >> >
>> > >> >
>> > >> > Regards,
>> > >> > Vinay
>> > >> >
>> > >> >
>> > >> > On Thu, Apr 28, 2016 at 12:29 PM, Jason Gustafson <
>> jason@confluent.io
>> > >
>> > >> > wrote:
>> > >> >
>> > >> >> Ah, yeah. That's probably caused by the new topic metadata
>> version,
>> > >> which
>> > >> >> isn't supported on 0.9 brokers. To test on trunk, you'd have to
>> > upgrade
>> > >> >> the
>> > >> >> brokers as well. Either that or you can rewind to before
>> KAFKA-3306
>> > >> (which
>> > >> >> was just committed the day before yesterday)?
>> > >> >>
>> > >> >> -Jason
>> > >> >>
>> > >> >> On Thu, Apr 28, 2016 at 9:01 AM, vinay sharma <
>> > >> vinsharma.tech@gmail.com>
>> > >> >> wrote:
>> > >> >>
>> > >> >> > Hi Jason,
>> > >> >> >
>> > >> >> > I build kafka-client and tried using it but my producers and
>> > >> consumers
>> > >> >> > started throwing below exception. Is 0.10 not going to be
>> > compatible
>> > >> >> with
>> > >> >> > brokers on version 0.9.0.1? or do i need to make some config
>> > changes
>> > >> to
>> > >> >> > producers / consumers to make them compatible with brokers on
>> old
>> > >> >> version?
>> > >> >> > or do i need to upgrade brokers to new version as well?
>> > >> >> >
>> > >> >> >  org.apache.kafka.common.protocol.types.SchemaException: Error
>> > >> reading
>> > >> >> > field 'brokers': Error reading field 'host': Error reading
>> string
>> > of
>> > >> >> length
>> > >> >> > 17995, only 145 bytes available
>> > >> >> > at
>> > org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
>> > >> >> > at
>> > >> >> >
>> > >> >> >
>> > >> >>
>> > >>
>> >
>> org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
>> > >> >> > at
>> > >> >> >
>> > >> >> >
>> > >> >>
>> > >>
>> >
>> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
>> > >> >> >
>> > >> >> > Regards,
>> > >> >> > Vinay Sharma
>> > >> >> >
>> > >> >> > On Thu, Apr 28, 2016 at 12:32 AM, Jason Gustafson <
>> > >> jason@confluent.io>
>> > >> >> > wrote:
>> > >> >> >
>> > >> >> > > Hey Vinay,
>> > >> >> > >
>> > >> >> > > Any chance you can run the same test against trunk? I'm
>> guessing
>> > >> this
>> > >> >> > might
>> > >> >> > > be caused by a bug in the 0.9 consumer which basically causes
>> > some
>> > >> >> > requests
>> > >> >> > > to fail when a bunch of them are sent to the broker at the
>> same
>> > >> time.
>> > >> >> > >
>> > >> >> > > -Jason
>> > >> >> > >
>> > >> >> > > On Wed, Apr 27, 2016 at 1:02 PM, vinay sharma <
>> > >> >> vinsharma.tech@gmail.com>
>> > >> >> > > wrote:
>> > >> >> > >
>> > >> >> > > > Hi Jason,
>> > >> >> > > >
>> > >> >> > > > This makes sense.We use 0.9.0.1 and we do have session
>> timeout
>> > >> set a
>> > >> >> > bit
>> > >> >> > > > high but nothing can guarantee that there will be no case
>> when
>> > >> >> > processing
>> > >> >> > > > may not go higher than session timeout. I am trying to test
>> a
>> > >> >> proactive
>> > >> >> > > > commit approach to handle such cases when processing takes
>> > >> unusually
>> > >> >> > long
>> > >> >> > > > time. To keep consumer's session alive during long
>> processing
>> > >> time i
>> > >> >> > > > proactively commitSync processed records every 15 seconds.
>> > >> Session
>> > >> >> > > timeout
>> > >> >> > > > i kept is 30000.
>> > >> >> > > >
>> > >> >> > > > *Problem:-*
>> > >> >> > > > With heart beat interval is 3000 then i expect a hearbeat
>> > request
>> > >> >> to be
>> > >> >> > > > sent on each proactive commit which happens every 15
>> seconds.
>> > In
>> > >> my
>> > >> >> > > tests i
>> > >> >> > > > see that this does not happen always. I see a time window
>> which
>> > >> is
>> > >> >> > > greater
>> > >> >> > > > than 30 seconds where no hearbeat is sent even thought there
>> > were
>> > >> >> > commits
>> > >> >> > > > in this duration. After this window i see a couple of
>> > successful
>> > >> >> > > heartbeat
>> > >> >> > > > responses till the end of poll but as soon as i poll again
>> and
>> > >> call
>> > >> >> > > > commitSync in next poll i get "ILLEGAL_GENERATION" error.
>> This
>> > >> error
>> > >> >> > > always
>> > >> >> > > > happen just after meta refresh or in next poll processing
>> > after a
>> > >> >> meta
>> > >> >> > > > refresh. I am attaching logs where i kept meta refresh
>> interval
>> > >> >> 40000,
>> > >> >> > > > 90000, 500000.
>> > >> >> > > >
>> > >> >> > > > *Test results *:-
>> > >> >> > > > Test with meta refresh 40000 ms ran around 70 seconds from
>> 1st
>> > >> poll.
>> > >> >> > > > Test with meta refresh 90000 ms ran around 120 seconds from
>> 1st
>> > >> >> poll.
>> > >> >> > > > Test with meta refresh 500000 ms ran around 564 seconds from
>> > 1st
>> > >> >> poll.
>> > >> >> > > >
>> > >> >> > > > Every test falls in line with above test cases where
>> generation
>> > >> is
>> > >> >> > marked
>> > >> >> > > > dead some time after a meta refresh. Meta refresh before 1st
>> > poll
>> > >> >> does
>> > >> >> > > not
>> > >> >> > > > create any issue but the ones after poll and during long
>> > >> processing
>> > >> >> do.
>> > >> >> > > >
>> > >> >> > > > *Environment:-*
>> > >> >> > > > My setup has 3 brokers 1 zk. Topic has 3 partitions ans has
>> > >> >> replication
>> > >> >> > > > factor 3. Messages are already published to topic.
>> > >> >> > > >
>> > >> >> > > > *Logic used in test cases :- *
>> > >> >> > > > On each poll I initialize a map with current committed
>> offset
>> > >> >> position
>> > >> >> > of
>> > >> >> > > > partitions being consumed. I update this map after each
>> record
>> > >> >> > processing
>> > >> >> > > > and use this map to proactively commit every 15 seconds.
>> Map is
>> > >> >> > > initialized
>> > >> >> > > > again after a proactive commit.
>> > >> >> > > >
>> > >> >> > > > I am not sure what is wrong here but i do not see any issue
>> in
>> > >> code
>> > >> >> or
>> > >> >> > > > offset commits going on. Log files and a class with main
>> method
>> > >> are
>> > >> >> > > > attached for your reference.
>> > >> >> > > >
>> > >> >> > > > Regards,
>> > >> >> > > > Vinay Sharma
>> > >> >> > > >
>> > >> >> > > >
>> > >> >> > > >
>> > >> >> > > > On Wed, Apr 27, 2016 at 2:46 PM, Jason Gustafson <
>> > >> >> jason@confluent.io>
>> > >> >> > > > wrote:
>> > >> >> > > >
>> > >> >> > > >> Hi Vinay,
>> > >> >> > > >>
>> > >> >> > > >> Answers below:
>> > >> >> > > >>
>> > >> >> > > >> 1)  Is it correct to say that each commitSync will trigger
>> a
>> > >> >> > > >> HeartBeatTask?
>> > >> >> > > >> > If there is no hear beat sent in past since specified
>> > >> heartbeat
>> > >> >> > > interval
>> > >> >> > > >> > then i should see a successful heartbeat response or
>> failure
>> > >> >> message
>> > >> >> > > in
>> > >> >> > > >> > logs near to commitSync success log?
>> > >> >> > > >>
>> > >> >> > > >>
>> > >> >> > > >> Not quite. Heartbeats are sent periodically according to
>> the
>> > >> >> > > >> heartbeat.interval.ms configuration. However, since the
>> > >> consumer
>> > >> >> has
>> > >> >> > no
>> > >> >> > > >> background thread, they can only be sent in API calls such
>> as
>> > >> >> poll()
>> > >> >> > or
>> > >> >> > > >> commitSync(). So calling commitSync() may or may not result
>> > in a
>> > >> >> > > heartbeat
>> > >> >> > > >> depending only on whether one is "due."
>> > >> >> > > >>
>> > >> >> > > >> 2) is it correct to say that Meta Data refresh will not
>> act as
>> > >> >> > > heartbeat,
>> > >> >> > > >> > will not trigger heartBeatTask and will not reset
>> > >> heartBeatTask?
>> > >> >> > > >>
>> > >> >> > > >>
>> > >> >> > > >> That is correct. Metadata refreshes are not related to
>> > >> heartbeats.
>> > >> >> > > >>
>> > >> >> > > >> 3) Where does a consumer session maintained? Lets say my
>> > >> consumer
>> > >> >> is
>> > >> >> > > >> > listening to 3 partitions on a 3 broker cluster where
>> each
>> > >> >> broker is
>> > >> >> > > >> leader
>> > >> >> > > >> > of 1 partition. So will each of the brokers will have a
>> > >> session
>> > >> >> for
>> > >> >> > my
>> > >> >> > > >> > consumer or is it just 1 session maintained somewhere in
>> > >> common
>> > >> >> like
>> > >> >> > > >> > zookeeper?
>> > >> >> > > >>
>> > >> >> > > >>
>> > >> >> > > >> One of the brokers serves as the "group coordinator." When
>> the
>> > >> >> > consumer
>> > >> >> > > >> starts up, it sends a GroupCoordinator request to one of
>> the
>> > >> >> brokers
>> > >> >> > to
>> > >> >> > > >> find out who the coordinator is. Currently, coordinators
>> are
>> > >> chosen
>> > >> >> > from
>> > >> >> > > >> among the leaders of the partitions of the
>> __consumer_offsets
>> > >> >> topic.
>> > >> >> > > This
>> > >> >> > > >> lets us take advantage of the leader election process to
>> also
>> > >> >> handle
>> > >> >> > > >> coordinator failures. The coordinator of each group
>> maintains
>> > >> state
>> > >> >> > for
>> > >> >> > > >> the
>> > >> >> > > >> group and keeps track of session timeouts.
>> > >> >> > > >>
>> > >> >> > > >> 4) In above setup, during a long processing if I commit a
>> > record
>> > >> >> > through
>> > >> >> > > >> > commmitSync which triggers a hear beat request and a
>> > >> successful
>> > >> >> > > >> response is
>> > >> >> > > >> > received for the same then what does this response means?
>> > >> does it
>> > >> >> > mean
>> > >> >> > > >> that
>> > >> >> > > >> > my session with each broker is renewed? or does it mean
>> that
>> > >> just
>> > >> >> > the
>> > >> >> > > >> > leader for partition of committed record knows that my
>> > >> consumer
>> > >> >> is
>> > >> >> > > alive
>> > >> >> > > >> > and consumer's session on other brokers will still
>> timeout?
>> > >> >> > > >>
>> > >> >> > > >>
>> > >> >> > > >> The coordinator is the only broker that is aware of a
>> > consumer's
>> > >> >> > session
>> > >> >> > > >> and all offset commits are sent to it. Successful
>> heartbeats
>> > >> mean
>> > >> >> that
>> > >> >> > > the
>> > >> >> > > >> session is still active. Heartbeats are also used to let
>> the
>> > >> >> consumer
>> > >> >> > > >> discover when a rebalance has begun. If a new member joins
>> the
>> > >> >> group,
>> > >> >> > > then
>> > >> >> > > >> the coordinator returns an error code in the heartbeat
>> > >> responses of
>> > >> >> > the
>> > >> >> > > >> active members to let them know that they need to rejoin
>> the
>> > >> group
>> > >> >> so
>> > >> >> > > that
>> > >> >> > > >> partitions can be rebalanced.
>> > >> >> > > >>
>> > >> >> > > >> I wouldn't get too hung up on commit/heartbeat behavior.
>> The
>> > >> crux
>> > >> >> of
>> > >> >> > the
>> > >> >> > > >> issue is that you need to call poll() often enough to avoid
>> > >> getting
>> > >> >> > > timed
>> > >> >> > > >> out by the coordinator. If you find this happening
>> frequently,
>> > >> you
>> > >> >> > > >> probably
>> > >> >> > > >> need to increase session.timeout.ms. There's not really
>> any
>> > >> >> downside
>> > >> >> > to
>> > >> >> > > >> doing so other than that hard failures (in which the
>> consumer
>> > >> >> can't be
>> > >> >> > > >> shutdown cleanly) will take a little longer to detect.
>> Normal
>> > >> >> shutdown
>> > >> >> > > >> doesn't have this problem. It can be difficult in 0.9 to
>> > ensure
>> > >> >> that
>> > >> >> > > >> poll()
>> > >> >> > > >> is called often enough since you don't have direct control
>> > over
>> > >> the
>> > >> >> > > amount
>> > >> >> > > >> of data returned in poll(), but we're adding an option
>> > >> >> > > (max.poll.records)
>> > >> >> > > >> in 0.10 which hopefully can be set conservatively enough to
>> > make
>> > >> >> this
>> > >> >> > > >> problem go away.
>> > >> >> > > >>
>> > >> >> > > >> -Jason
>> > >> >> > > >>
>> > >> >> > > >> On Wed, Apr 27, 2016 at 7:11 AM, vinay sharma <
>> > >> >> > vinsharma.tech@gmail.com
>> > >> >> > > >
>> > >> >> > > >> wrote:
>> > >> >> > > >>
>> > >> >> > > >> > Hey,
>> > >> >> > > >> >
>> > >> >> > > >> > I am working on a simplified test case to check if there
>> is
>> > >> any
>> > >> >> > issue
>> > >> >> > > >> in my
>> > >> >> > > >> > code. Just to make sure that any of my assumptions are
>> not
>> > >> >> wrong, it
>> > >> >> > > >> will
>> > >> >> > > >> > be great if you can please help me in finding answers to
>> > >> >> following
>> > >> >> > > >> > queries:-
>> > >> >> > > >> >
>> > >> >> > > >> > 1)  Is it correct to say that each commitSync will
>> trigger a
>> > >> >> > > >> HeartBeatTask?
>> > >> >> > > >> > If there is no hear beat sent in past since specified
>> > >> heartbeat
>> > >> >> > > interval
>> > >> >> > > >> > then i should see a successful heartbeat response or
>> failure
>> > >> >> message
>> > >> >> > > in
>> > >> >> > > >> > logs near to commitSync success log?
>> > >> >> > > >> > 2) is it correct to say that Meta Data refresh will not
>> act
>> > as
>> > >> >> > > >> heartbeat,
>> > >> >> > > >> > will not trigger heartBeatTask and will not reset
>> > >> heartBeatTask?
>> > >> >> > > >> > 3) Where does a consumer session maintained? Lets say my
>> > >> >> consumer is
>> > >> >> > > >> > listening to 3 partitions on a 3 broker cluster where
>> each
>> > >> >> broker is
>> > >> >> > > >> leader
>> > >> >> > > >> > of 1 partition. So will each of the brokers will have a
>> > >> session
>> > >> >> for
>> > >> >> > my
>> > >> >> > > >> > consumer or is it just 1 session maintained somewhere in
>> > >> common
>> > >> >> like
>> > >> >> > > >> > zookeeper?
>> > >> >> > > >> > 4) In above setup, during a long processing if I commit a
>> > >> record
>> > >> >> > > through
>> > >> >> > > >> > commmitSync which triggers a hear beat request and a
>> > >> successful
>> > >> >> > > >> response is
>> > >> >> > > >> > received for the same then what does this response means?
>> > >> does it
>> > >> >> > mean
>> > >> >> > > >> that
>> > >> >> > > >> > my session with each broker is renewed? or does it mean
>> that
>> > >> just
>> > >> >> > the
>> > >> >> > > >> > leader for partition of committed record knows that my
>> > >> consumer
>> > >> >> is
>> > >> >> > > alive
>> > >> >> > > >> > and consumer's session on other brokers will still
>> timeout?
>> > >> >> > > >> >
>> > >> >> > > >> > Regards,
>> > >> >> > > >> > Vinay Sharma
>> > >> >> > > >> >
>> > >> >> > > >> > On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson <
>> > >> >> > jason@confluent.io>
>> > >> >> > > >> > wrote:
>> > >> >> > > >> >
>> > >> >> > > >> > > Hey Vinay,
>> > >> >> > > >> > >
>> > >> >> > > >> > > Are you saying that heartbeats are not sent while a
>> > metadata
>> > >> >> > refresh
>> > >> >> > > >> is
>> > >> >> > > >> > in
>> > >> >> > > >> > > progress? Do you have any logs which show us the
>> apparent
>> > >> >> problem?
>> > >> >> > > >> > >
>> > >> >> > > >> > > Thanks,
>> > >> >> > > >> > > Jason
>> > >> >> > > >> > >
>> > >> >> > > >> > > On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <
>> > >> >> > > >> vinsharma.tech@gmail.com>
>> > >> >> > > >> > > wrote:
>> > >> >> > > >> > >
>> > >> >> > > >> > > > Hi Ismael,
>> > >> >> > > >> > > >
>> > >> >> > > >> > > > Treating commitSync as heartbeat will definitely
>> resolve
>> > >> the
>> > >> >> > issue
>> > >> >> > > >> i am
>> > >> >> > > >> > > > facing but the reason behind my issue does not seem
>> to
>> > be
>> > >> >> what
>> > >> >> > > >> > mentioned
>> > >> >> > > >> > > in
>> > >> >> > > >> > > > defect (i.e frequent commitSync requests).
>> > >> >> > > >> > > >
>> > >> >> > > >> > > > I am sending CommitSync periodically only to keep my
>> > >> session
>> > >> >> > alive
>> > >> >> > > >> when
>> > >> >> > > >> > > my
>> > >> >> > > >> > > > consumer is still processing records and is close to
>> > >> session
>> > >> >> > time
>> > >> >> > > >> out
>> > >> >> > > >> > > > (tried 10th / 12th / 15th / 20th second after poll
>> > called
>> > >> >> where
>> > >> >> > > >> session
>> > >> >> > > >> > > > time is 30). I see heartbeat response received in
>> logs
>> > >> along
>> > >> >> > with
>> > >> >> > > >> each
>> > >> >> > > >> > > > commitSync call but this stops after a meta data
>> refresh
>> > >> >> request
>> > >> >> > > is
>> > >> >> > > >> > > issued.
>> > >> >> > > >> > > > I see in logs that commit goes successful but no
>> > heartbeat
>> > >> >> > > response
>> > >> >> > > >> > > > received message in logs after meta refresh till next
>> > >> poll.
>> > >> >> > > >> > > >
>> > >> >> > > >> > > > Regards,
>> > >> >> > > >> > > > Vinay Sharma
>> > >> >> > > >> > > >
>> > >> >> > > >> > > > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <
>> > >> >> ismael@juma.me.uk
>> > >> >> > >
>> > >> >> > > >> > wrote:
>> > >> >> > > >> > > >
>> > >> >> > > >> > > > > Hi Vinay,
>> > >> >> > > >> > > > >
>> > >> >> > > >> > > > > This was fixed via
>> > >> >> > > >> https://issues.apache.org/jira/browse/KAFKA-3470
>> > >> >> > > >> > > > (will
>> > >> >> > > >> > > > > be part of 0.10.0.0).
>> > >> >> > > >> > > > >
>> > >> >> > > >> > > > > Ismael
>> > >> >> > > >> > > > >
>> > >> >> > > >> > > > >
>> > >> >> > > >> > > > >
>> > >> >> > > >> > > > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <
>> > >> >> > > >> > > vinsharma.tech@gmail.com>
>> > >> >> > > >> > > > > wrote:
>> > >> >> > > >> > > > >
>> > >> >> > > >> > > > > > Hello,
>> > >> >> > > >> > > > > >
>> > >> >> > > >> > > > > > I am using client API 0.9.0.1 and facing an
>> issue.
>> > As
>> > >> >> per my
>> > >> >> > > >> logs
>> > >> >> > > >> > it
>> > >> >> > > >> > > > > seems
>> > >> >> > > >> > > > > > that on each commitSync(Offsets) a heartbeat
>> request
>> > >> is
>> > >> >> sent
>> > >> >> > > but
>> > >> >> > > >> > > after
>> > >> >> > > >> > > > a
>> > >> >> > > >> > > > > > metada refresh request till next poll(), commits
>> do
>> > >> not
>> > >> >> send
>> > >> >> > > any
>> > >> >> > > >> > > > hearbeat
>> > >> >> > > >> > > > > > request.
>> > >> >> > > >> > > > > >
>> > >> >> > > >> > > > > > KafkaConsumers i create sometimes get session
>> time
>> > out
>> > >> >> due
>> > >> >> > to
>> > >> >> > > no
>> > >> >> > > >> > > > hearbeat
>> > >> >> > > >> > > > > > specially during longer processing times. I call
>> > >> >> > > >> > CommitSync(offsets)
>> > >> >> > > >> > > > > after
>> > >> >> > > >> > > > > > regular intervals to keep session alive when
>> > >> processing
>> > >> >> > takes
>> > >> >> > > >> > longer
>> > >> >> > > >> > > > than
>> > >> >> > > >> > > > > > usual. Every thing works fine if commit intervals
>> > are
>> > >> >> very
>> > >> >> > > >> small or
>> > >> >> > > >> > > if
>> > >> >> > > >> > > > i
>> > >> >> > > >> > > > > > commit after each record but if i commit lets say
>> > >> every
>> > >> >> 12
>> > >> >> > > >> seconds
>> > >> >> > > >> > > and
>> > >> >> > > >> > > > 30
>> > >> >> > > >> > > > > > seconds is session time then i can see consumer
>> > >> getting
>> > >> >> > timed
>> > >> >> > > >> out
>> > >> >> > > >> > > > > > sometimes.
>> > >> >> > > >> > > > > >
>> > >> >> > > >> > > > > > Any help or pointers will be much appreciated.
>> > Thanks
>> > >> in
>> > >> >> > > >> advance.
>> > >> >> > > >> > > > > >
>> > >> >> > > >> > > > > > Regards,
>> > >> >> > > >> > > > > > Vinay sharma
>> > >> >> > > >> > > > > >
>> > >> >> > > >> > > > >
>> > >> >> > > >> > > >
>> > >> >> > > >> > >
>> > >> >> > > >> >
>> > >> >> > > >>
>> > >> >> > > >
>> > >> >> > > >
>> > >> >> > >
>> > >> >> >
>> > >> >>
>> > >> >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>

Re: No Heartbeat request on commit

Posted by vinay sharma <vi...@gmail.com>.

I sent it as attachment. I did not zip it and may have been rejected by
mail server. Will send again shortly.
On Apr 28, 2016 5:11 PM, "Jason Gustafson" <ja...@confluent.io> wrote:

> Hey Vinay,
>
> Did you forget to attach the simple class?
>
> -Jason
>
> On Thu, Apr 28, 2016 at 1:14 PM, vinay sharma <vi...@gmail.com>
> wrote:
>
> > I was also wondering that if commitSync acts as heartbeat then why do we
> > still trigger heartbeat request on commit? why not just reset its time on
> > successful commitSync? or am i wrong and we do this already?
> >
> > Regards,
> > Vinay
> >
> > On Thu, Apr 28, 2016 at 3:38 PM, vinay sharma <vi...@gmail.com>
> > wrote:
> >
> > > Hi Jason,
> > >
> > > Attached is a simple class with a main method. I used this for
> > reproducing
> > > issue and generate logs that i attached earlier. This class has code
> > > snippets of poller relevant to the issue.
> > >
> > > Regards,
> > > Vinay Sharma
> > >
> > > On Thu, Apr 28, 2016 at 3:30 PM, Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > >> Hey Vinay,
> > >>
> > >> Thanks, that's really helpful. It does seem like there might be a
> > problem
> > >> with the heartbeat trigger logic. I'll see if I can reproduce what
> > you're
> > >> seeing locally. Might be helpful if you share a snippet of your poll
> > loop.
> > >>
> > >> Thanks,
> > >> Jason
> > >>
> > >> On Thu, Apr 28, 2016 at 11:55 AM, vinay sharma <
> > vinsharma.tech@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Jason,
> > >> >
> > >> > i reverted back to KAFKA-3149. Producer still had issues related to
> > >> schema
> > >> > but my consumer worked.
> > >> >
> > >> > Now consumer worked as expected. Although i did not encountered an
> > error
> > >> > and generation was not marked dead by coordinator but i still see
> that
> > >> > successful heartbeat response are not logged as expected.
> > >> > My observation is following:-
> > >> > 1) Meta refresh also triggers heartbeat request. I say this because
> > >> > sometimes i see 2 heartbeat responses logged just a few milliseconds
> > >> away
> > >> > where meta refresh and proactive commit happened almost
> > simultaneously.
> > >> > 2) I still see that some commitSync requests do not have a heartbeat
> > >> > logged before or after commit. Although next proactive commit
> happened
> > >> just
> > >> > in time and this time heartbeat request was successful hence saved
> > >> session.
> > >> > In attached log you can see that poll was done at 14:17:41, a commit
> > >> > happened at 14:17:56 and another commit happened at 14:18:14. The
> only
> > >> > heart beat response logged during this time is at 14:18:14 which is
> 29
> > >> > seconds after poll where as a commit was performed 15 seconds after
> > >> poll.
> > >> > Heartbeat interval was 3000.
> > >> > 3) There are long pauses in heartbeat responses in logs which should
> > >> cause
> > >> > session to timeout but its not happening. This implies that commits
> > >> trigger
> > >> > a heartbeat but they also act as heartbeat.
> > >> >
> > >> >
> > >> > Regards,
> > >> > Vinay
> > >> >
> > >> >
> > >> > On Thu, Apr 28, 2016 at 12:29 PM, Jason Gustafson <
> jason@confluent.io
> > >
> > >> > wrote:
> > >> >
> > >> >> Ah, yeah. That's probably caused by the new topic metadata version,
> > >> which
> > >> >> isn't supported on 0.9 brokers. To test on trunk, you'd have to
> > upgrade
> > >> >> the
> > >> >> brokers as well. Either that or you can rewind to before KAFKA-3306
> > >> (which
> > >> >> was just committed the day before yesterday)?
> > >> >>
> > >> >> -Jason
> > >> >>
> > >> >> On Thu, Apr 28, 2016 at 9:01 AM, vinay sharma <
> > >> vinsharma.tech@gmail.com>
> > >> >> wrote:
> > >> >>
> > >> >> > Hi Jason,
> > >> >> >
> > >> >> > I build kafka-client and tried using it but my producers and
> > >> consumers
> > >> >> > started throwing below exception. Is 0.10 not going to be
> > compatible
> > >> >> with
> > >> >> > brokers on version 0.9.0.1? or do i need to make some config
> > changes
> > >> to
> > >> >> > producers / consumers to make them compatible with brokers on old
> > >> >> version?
> > >> >> > or do i need to upgrade brokers to new version as well?
> > >> >> >
> > >> >> >  org.apache.kafka.common.protocol.types.SchemaException: Error
> > >> reading
> > >> >> > field 'brokers': Error reading field 'host': Error reading string
> > of
> > >> >> length
> > >> >> > 17995, only 145 bytes available
> > >> >> > at
> > org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
> > >> >> > at
> > >> >> >
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
> > >> >> > at
> > >> >> >
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
> > >> >> >
> > >> >> > Regards,
> > >> >> > Vinay Sharma
> > >> >> >
> > >> >> > On Thu, Apr 28, 2016 at 12:32 AM, Jason Gustafson <
> > >> jason@confluent.io>
> > >> >> > wrote:
> > >> >> >
> > >> >> > > Hey Vinay,
> > >> >> > >
> > >> >> > > Any chance you can run the same test against trunk? I'm
> guessing
> > >> this
> > >> >> > might
> > >> >> > > be caused by a bug in the 0.9 consumer which basically causes
> > some
> > >> >> > requests
> > >> >> > > to fail when a bunch of them are sent to the broker at the same
> > >> time.
> > >> >> > >
> > >> >> > > -Jason
> > >> >> > >
> > >> >> > > On Wed, Apr 27, 2016 at 1:02 PM, vinay sharma <
> > >> >> vinsharma.tech@gmail.com>
> > >> >> > > wrote:
> > >> >> > >
> > >> >> > > > Hi Jason,
> > >> >> > > >
> > >> >> > > > This makes sense.We use 0.9.0.1 and we do have session
> timeout
> > >> set a
> > >> >> > bit
> > >> >> > > > high but nothing can guarantee that there will be no case
> when
> > >> >> > processing
> > >> >> > > > may not go higher than session timeout. I am trying to test a
> > >> >> proactive
> > >> >> > > > commit approach to handle such cases when processing takes
> > >> unusually
> > >> >> > long
> > >> >> > > > time. To keep consumer's session alive during long processing
> > >> time i
> > >> >> > > > proactively commitSync processed records every 15 seconds.
> > >> Session
> > >> >> > > timeout
> > >> >> > > > i kept is 30000.
> > >> >> > > >
> > >> >> > > > *Problem:-*
> > >> >> > > > With heart beat interval is 3000 then i expect a hearbeat
> > request
> > >> >> to be
> > >> >> > > > sent on each proactive commit which happens every 15 seconds.
> > In
> > >> my
> > >> >> > > tests i
> > >> >> > > > see that this does not happen always. I see a time window
> which
> > >> is
> > >> >> > > greater
> > >> >> > > > than 30 seconds where no hearbeat is sent even thought there
> > were
> > >> >> > commits
> > >> >> > > > in this duration. After this window i see a couple of
> > successful
> > >> >> > > heartbeat
> > >> >> > > > responses till the end of poll but as soon as i poll again
> and
> > >> call
> > >> >> > > > commitSync in next poll i get "ILLEGAL_GENERATION" error.
> This
> > >> error
> > >> >> > > always
> > >> >> > > > happen just after meta refresh or in next poll processing
> > after a
> > >> >> meta
> > >> >> > > > refresh. I am attaching logs where i kept meta refresh
> interval
> > >> >> 40000,
> > >> >> > > > 90000, 500000.
> > >> >> > > >
> > >> >> > > > *Test results *:-
> > >> >> > > > Test with meta refresh 40000 ms ran around 70 seconds from
> 1st
> > >> poll.
> > >> >> > > > Test with meta refresh 90000 ms ran around 120 seconds from
> 1st
> > >> >> poll.
> > >> >> > > > Test with meta refresh 500000 ms ran around 564 seconds from
> > 1st
> > >> >> poll.
> > >> >> > > >
> > >> >> > > > Every test falls in line with above test cases where
> generation
> > >> is
> > >> >> > marked
> > >> >> > > > dead some time after a meta refresh. Meta refresh before 1st
> > poll
> > >> >> does
> > >> >> > > not
> > >> >> > > > create any issue but the ones after poll and during long
> > >> processing
> > >> >> do.
> > >> >> > > >
> > >> >> > > > *Environment:-*
> > >> >> > > > My setup has 3 brokers 1 zk. Topic has 3 partitions ans has
> > >> >> replication
> > >> >> > > > factor 3. Messages are already published to topic.
> > >> >> > > >
> > >> >> > > > *Logic used in test cases :- *
> > >> >> > > > On each poll I initialize a map with current committed offset
> > >> >> position
> > >> >> > of
> > >> >> > > > partitions being consumed. I update this map after each
> record
> > >> >> > processing
> > >> >> > > > and use this map to proactively commit every 15 seconds. Map
> is
> > >> >> > > initialized
> > >> >> > > > again after a proactive commit.
> > >> >> > > >
> > >> >> > > > I am not sure what is wrong here but i do not see any issue
> in
> > >> code
> > >> >> or
> > >> >> > > > offset commits going on. Log files and a class with main
> method
> > >> are
> > >> >> > > > attached for your reference.
> > >> >> > > >
> > >> >> > > > Regards,
> > >> >> > > > Vinay Sharma
> > >> >> > > >
> > >> >> > > >
> > >> >> > > >
> > >> >> > > > On Wed, Apr 27, 2016 at 2:46 PM, Jason Gustafson <
> > >> >> jason@confluent.io>
> > >> >> > > > wrote:
> > >> >> > > >
> > >> >> > > >> Hi Vinay,
> > >> >> > > >>
> > >> >> > > >> Answers below:
> > >> >> > > >>
> > >> >> > > >> 1)  Is it correct to say that each commitSync will trigger a
> > >> >> > > >> HeartBeatTask?
> > >> >> > > >> > If there is no hear beat sent in past since specified
> > >> heartbeat
> > >> >> > > interval
> > >> >> > > >> > then i should see a successful heartbeat response or
> failure
> > >> >> message
> > >> >> > > in
> > >> >> > > >> > logs near to commitSync success log?
> > >> >> > > >>
> > >> >> > > >>
> > >> >> > > >> Not quite. Heartbeats are sent periodically according to the
> > >> >> > > >> heartbeat.interval.ms configuration. However, since the
> > >> consumer
> > >> >> has
> > >> >> > no
> > >> >> > > >> background thread, they can only be sent in API calls such
> as
> > >> >> poll()
> > >> >> > or
> > >> >> > > >> commitSync(). So calling commitSync() may or may not result
> > in a
> > >> >> > > heartbeat
> > >> >> > > >> depending only on whether one is "due."
> > >> >> > > >>
> > >> >> > > >> 2) is it correct to say that Meta Data refresh will not act
> as
> > >> >> > > heartbeat,
> > >> >> > > >> > will not trigger heartBeatTask and will not reset
> > >> heartBeatTask?
> > >> >> > > >>
> > >> >> > > >>
> > >> >> > > >> That is correct. Metadata refreshes are not related to
> > >> heartbeats.
> > >> >> > > >>
> > >> >> > > >> 3) Where does a consumer session maintained? Lets say my
> > >> consumer
> > >> >> is
> > >> >> > > >> > listening to 3 partitions on a 3 broker cluster where each
> > >> >> broker is
> > >> >> > > >> leader
> > >> >> > > >> > of 1 partition. So will each of the brokers will have a
> > >> session
> > >> >> for
> > >> >> > my
> > >> >> > > >> > consumer or is it just 1 session maintained somewhere in
> > >> common
> > >> >> like
> > >> >> > > >> > zookeeper?
> > >> >> > > >>
> > >> >> > > >>
> > >> >> > > >> One of the brokers serves as the "group coordinator." When
> the
> > >> >> > consumer
> > >> >> > > >> starts up, it sends a GroupCoordinator request to one of the
> > >> >> brokers
> > >> >> > to
> > >> >> > > >> find out who the coordinator is. Currently, coordinators are
> > >> chosen
> > >> >> > from
> > >> >> > > >> among the leaders of the partitions of the
> __consumer_offsets
> > >> >> topic.
> > >> >> > > This
> > >> >> > > >> lets us take advantage of the leader election process to
> also
> > >> >> handle
> > >> >> > > >> coordinator failures. The coordinator of each group
> maintains
> > >> state
> > >> >> > for
> > >> >> > > >> the
> > >> >> > > >> group and keeps track of session timeouts.
> > >> >> > > >>
> > >> >> > > >> 4) In above setup, during a long processing if I commit a
> > record
> > >> >> > through
> > >> >> > > >> > commmitSync which triggers a hear beat request and a
> > >> successful
> > >> >> > > >> response is
> > >> >> > > >> > received for the same then what does this response means?
> > >> does it
> > >> >> > mean
> > >> >> > > >> that
> > >> >> > > >> > my session with each broker is renewed? or does it mean
> that
> > >> just
> > >> >> > the
> > >> >> > > >> > leader for partition of committed record knows that my
> > >> consumer
> > >> >> is
> > >> >> > > alive
> > >> >> > > >> > and consumer's session on other brokers will still
> timeout?
> > >> >> > > >>
> > >> >> > > >>
> > >> >> > > >> The coordinator is the only broker that is aware of a
> > consumer's
> > >> >> > session
> > >> >> > > >> and all offset commits are sent to it. Successful heartbeats
> > >> mean
> > >> >> that
> > >> >> > > the
> > >> >> > > >> session is still active. Heartbeats are also used to let the
> > >> >> consumer
> > >> >> > > >> discover when a rebalance has begun. If a new member joins
> the
> > >> >> group,
> > >> >> > > then
> > >> >> > > >> the coordinator returns an error code in the heartbeat
> > >> responses of
> > >> >> > the
> > >> >> > > >> active members to let them know that they need to rejoin the
> > >> group
> > >> >> so
> > >> >> > > that
> > >> >> > > >> partitions can be rebalanced.
> > >> >> > > >>
> > >> >> > > >> I wouldn't get too hung up on commit/heartbeat behavior. The
> > >> crux
> > >> >> of
> > >> >> > the
> > >> >> > > >> issue is that you need to call poll() often enough to avoid
> > >> getting
> > >> >> > > timed
> > >> >> > > >> out by the coordinator. If you find this happening
> frequently,
> > >> you
> > >> >> > > >> probably
> > >> >> > > >> need to increase session.timeout.ms. There's not really any
> > >> >> downside
> > >> >> > to
> > >> >> > > >> doing so other than that hard failures (in which the
> consumer
> > >> >> can't be
> > >> >> > > >> shutdown cleanly) will take a little longer to detect.
> Normal
> > >> >> shutdown
> > >> >> > > >> doesn't have this problem. It can be difficult in 0.9 to
> > ensure
> > >> >> that
> > >> >> > > >> poll()
> > >> >> > > >> is called often enough since you don't have direct control
> > over
> > >> the
> > >> >> > > amount
> > >> >> > > >> of data returned in poll(), but we're adding an option
> > >> >> > > (max.poll.records)
> > >> >> > > >> in 0.10 which hopefully can be set conservatively enough to
> > make
> > >> >> this
> > >> >> > > >> problem go away.
> > >> >> > > >>
> > >> >> > > >> -Jason
> > >> >> > > >>
> > >> >> > > >> On Wed, Apr 27, 2016 at 7:11 AM, vinay sharma <
> > >> >> > vinsharma.tech@gmail.com
> > >> >> > > >
> > >> >> > > >> wrote:
> > >> >> > > >>
> > >> >> > > >> > Hey,
> > >> >> > > >> >
> > >> >> > > >> > I am working on a simplified test case to check if there
> is
> > >> any
> > >> >> > issue
> > >> >> > > >> in my
> > >> >> > > >> > code. Just to make sure that any of my assumptions are not
> > >> >> wrong, it
> > >> >> > > >> will
> > >> >> > > >> > be great if you can please help me in finding answers to
> > >> >> following
> > >> >> > > >> > queries:-
> > >> >> > > >> >
> > >> >> > > >> > 1)  Is it correct to say that each commitSync will
> trigger a
> > >> >> > > >> HeartBeatTask?
> > >> >> > > >> > If there is no hear beat sent in past since specified
> > >> heartbeat
> > >> >> > > interval
> > >> >> > > >> > then i should see a successful heartbeat response or
> failure
> > >> >> message
> > >> >> > > in
> > >> >> > > >> > logs near to commitSync success log?
> > >> >> > > >> > 2) is it correct to say that Meta Data refresh will not
> act
> > as
> > >> >> > > >> heartbeat,
> > >> >> > > >> > will not trigger heartBeatTask and will not reset
> > >> heartBeatTask?
> > >> >> > > >> > 3) Where does a consumer session maintained? Lets say my
> > >> >> consumer is
> > >> >> > > >> > listening to 3 partitions on a 3 broker cluster where each
> > >> >> broker is
> > >> >> > > >> leader
> > >> >> > > >> > of 1 partition. So will each of the brokers will have a
> > >> session
> > >> >> for
> > >> >> > my
> > >> >> > > >> > consumer or is it just 1 session maintained somewhere in
> > >> common
> > >> >> like
> > >> >> > > >> > zookeeper?
> > >> >> > > >> > 4) In above setup, during a long processing if I commit a
> > >> record
> > >> >> > > through
> > >> >> > > >> > commmitSync which triggers a hear beat request and a
> > >> successful
> > >> >> > > >> response is
> > >> >> > > >> > received for the same then what does this response means?
> > >> does it
> > >> >> > mean
> > >> >> > > >> that
> > >> >> > > >> > my session with each broker is renewed? or does it mean
> that
> > >> just
> > >> >> > the
> > >> >> > > >> > leader for partition of committed record knows that my
> > >> consumer
> > >> >> is
> > >> >> > > alive
> > >> >> > > >> > and consumer's session on other brokers will still
> timeout?
> > >> >> > > >> >
> > >> >> > > >> > Regards,
> > >> >> > > >> > Vinay Sharma
> > >> >> > > >> >
> > >> >> > > >> > On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson <
> > >> >> > jason@confluent.io>
> > >> >> > > >> > wrote:
> > >> >> > > >> >
> > >> >> > > >> > > Hey Vinay,
> > >> >> > > >> > >
> > >> >> > > >> > > Are you saying that heartbeats are not sent while a
> > metadata
> > >> >> > refresh
> > >> >> > > >> is
> > >> >> > > >> > in
> > >> >> > > >> > > progress? Do you have any logs which show us the
> apparent
> > >> >> problem?
> > >> >> > > >> > >
> > >> >> > > >> > > Thanks,
> > >> >> > > >> > > Jason
> > >> >> > > >> > >
> > >> >> > > >> > > On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <
> > >> >> > > >> vinsharma.tech@gmail.com>
> > >> >> > > >> > > wrote:
> > >> >> > > >> > >
> > >> >> > > >> > > > Hi Ismael,
> > >> >> > > >> > > >
> > >> >> > > >> > > > Treating commitSync as heartbeat will definitely
> resolve
> > >> the
> > >> >> > issue
> > >> >> > > >> i am
> > >> >> > > >> > > > facing but the reason behind my issue does not seem to
> > be
> > >> >> what
> > >> >> > > >> > mentioned
> > >> >> > > >> > > in
> > >> >> > > >> > > > defect (i.e frequent commitSync requests).
> > >> >> > > >> > > >
> > >> >> > > >> > > > I am sending CommitSync periodically only to keep my
> > >> session
> > >> >> > alive
> > >> >> > > >> when
> > >> >> > > >> > > my
> > >> >> > > >> > > > consumer is still processing records and is close to
> > >> session
> > >> >> > time
> > >> >> > > >> out
> > >> >> > > >> > > > (tried 10th / 12th / 15th / 20th second after poll
> > called
> > >> >> where
> > >> >> > > >> session
> > >> >> > > >> > > > time is 30). I see heartbeat response received in logs
> > >> along
> > >> >> > with
> > >> >> > > >> each
> > >> >> > > >> > > > commitSync call but this stops after a meta data
> refresh
> > >> >> request
> > >> >> > > is
> > >> >> > > >> > > issued.
> > >> >> > > >> > > > I see in logs that commit goes successful but no
> > heartbeat
> > >> >> > > response
> > >> >> > > >> > > > received message in logs after meta refresh till next
> > >> poll.
> > >> >> > > >> > > >
> > >> >> > > >> > > > Regards,
> > >> >> > > >> > > > Vinay Sharma
> > >> >> > > >> > > >
> > >> >> > > >> > > > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <
> > >> >> ismael@juma.me.uk
> > >> >> > >
> > >> >> > > >> > wrote:
> > >> >> > > >> > > >
> > >> >> > > >> > > > > Hi Vinay,
> > >> >> > > >> > > > >
> > >> >> > > >> > > > > This was fixed via
> > >> >> > > >> https://issues.apache.org/jira/browse/KAFKA-3470
> > >> >> > > >> > > > (will
> > >> >> > > >> > > > > be part of 0.10.0.0).
> > >> >> > > >> > > > >
> > >> >> > > >> > > > > Ismael
> > >> >> > > >> > > > >
> > >> >> > > >> > > > >
> > >> >> > > >> > > > >
> > >> >> > > >> > > > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <
> > >> >> > > >> > > vinsharma.tech@gmail.com>
> > >> >> > > >> > > > > wrote:
> > >> >> > > >> > > > >
> > >> >> > > >> > > > > > Hello,
> > >> >> > > >> > > > > >
> > >> >> > > >> > > > > > I am using client API 0.9.0.1 and facing an issue.
> > As
> > >> >> per my
> > >> >> > > >> logs
> > >> >> > > >> > it
> > >> >> > > >> > > > > seems
> > >> >> > > >> > > > > > that on each commitSync(Offsets) a heartbeat
> request
> > >> is
> > >> >> sent
> > >> >> > > but
> > >> >> > > >> > > after
> > >> >> > > >> > > > a
> > >> >> > > >> > > > > > metada refresh request till next poll(), commits
> do
> > >> not
> > >> >> send
> > >> >> > > any
> > >> >> > > >> > > > hearbeat
> > >> >> > > >> > > > > > request.
> > >> >> > > >> > > > > >
> > >> >> > > >> > > > > > KafkaConsumers i create sometimes get session time
> > out
> > >> >> due
> > >> >> > to
> > >> >> > > no
> > >> >> > > >> > > > hearbeat
> > >> >> > > >> > > > > > specially during longer processing times. I call
> > >> >> > > >> > CommitSync(offsets)
> > >> >> > > >> > > > > after
> > >> >> > > >> > > > > > regular intervals to keep session alive when
> > >> processing
> > >> >> > takes
> > >> >> > > >> > longer
> > >> >> > > >> > > > than
> > >> >> > > >> > > > > > usual. Every thing works fine if commit intervals
> > are
> > >> >> very
> > >> >> > > >> small or
> > >> >> > > >> > > if
> > >> >> > > >> > > > i
> > >> >> > > >> > > > > > commit after each record but if i commit lets say
> > >> every
> > >> >> 12
> > >> >> > > >> seconds
> > >> >> > > >> > > and
> > >> >> > > >> > > > 30
> > >> >> > > >> > > > > > seconds is session time then i can see consumer
> > >> getting
> > >> >> > timed
> > >> >> > > >> out
> > >> >> > > >> > > > > > sometimes.
> > >> >> > > >> > > > > >
> > >> >> > > >> > > > > > Any help or pointers will be much appreciated.
> > Thanks
> > >> in
> > >> >> > > >> advance.
> > >> >> > > >> > > > > >
> > >> >> > > >> > > > > > Regards,
> > >> >> > > >> > > > > > Vinay sharma
> > >> >> > > >> > > > > >
> > >> >> > > >> > > > >
> > >> >> > > >> > > >
> > >> >> > > >> > >
> > >> >> > > >> >
> > >> >> > > >>
> > >> >> > > >
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: No Heartbeat request on commit

Posted by Jason Gustafson <ja...@confluent.io>.

Hey Vinay,

Did you forget to attach the simple class?

-Jason

On Thu, Apr 28, 2016 at 1:14 PM, vinay sharma <vi...@gmail.com>
wrote:

> I was also wondering that if commitSync acts as heartbeat then why do we
> still trigger heartbeat request on commit? why not just reset its time on
> successful commitSync? or am i wrong and we do this already?
>
> Regards,
> Vinay
>
> On Thu, Apr 28, 2016 at 3:38 PM, vinay sharma <vi...@gmail.com>
> wrote:
>
> > Hi Jason,
> >
> > Attached is a simple class with a main method. I used this for
> reproducing
> > issue and generate logs that i attached earlier. This class has code
> > snippets of poller relevant to the issue.
> >
> > Regards,
> > Vinay Sharma
> >
> > On Thu, Apr 28, 2016 at 3:30 PM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> >> Hey Vinay,
> >>
> >> Thanks, that's really helpful. It does seem like there might be a
> problem
> >> with the heartbeat trigger logic. I'll see if I can reproduce what
> you're
> >> seeing locally. Might be helpful if you share a snippet of your poll
> loop.
> >>
> >> Thanks,
> >> Jason
> >>
> >> On Thu, Apr 28, 2016 at 11:55 AM, vinay sharma <
> vinsharma.tech@gmail.com>
> >> wrote:
> >>
> >> > Hi Jason,
> >> >
> >> > i reverted back to KAFKA-3149. Producer still had issues related to
> >> schema
> >> > but my consumer worked.
> >> >
> >> > Now consumer worked as expected. Although i did not encountered an
> error
> >> > and generation was not marked dead by coordinator but i still see that
> >> > successful heartbeat response are not logged as expected.
> >> > My observation is following:-
> >> > 1) Meta refresh also triggers heartbeat request. I say this because
> >> > sometimes i see 2 heartbeat responses logged just a few milliseconds
> >> away
> >> > where meta refresh and proactive commit happened almost
> simultaneously.
> >> > 2) I still see that some commitSync requests do not have a heartbeat
> >> > logged before or after commit. Although next proactive commit happened
> >> just
> >> > in time and this time heartbeat request was successful hence saved
> >> session.
> >> > In attached log you can see that poll was done at 14:17:41, a commit
> >> > happened at 14:17:56 and another commit happened at 14:18:14. The only
> >> > heart beat response logged during this time is at 14:18:14 which is 29
> >> > seconds after poll where as a commit was performed 15 seconds after
> >> poll.
> >> > Heartbeat interval was 3000.
> >> > 3) There are long pauses in heartbeat responses in logs which should
> >> cause
> >> > session to timeout but its not happening. This implies that commits
> >> trigger
> >> > a heartbeat but they also act as heartbeat.
> >> >
> >> >
> >> > Regards,
> >> > Vinay
> >> >
> >> >
> >> > On Thu, Apr 28, 2016 at 12:29 PM, Jason Gustafson <jason@confluent.io
> >
> >> > wrote:
> >> >
> >> >> Ah, yeah. That's probably caused by the new topic metadata version,
> >> which
> >> >> isn't supported on 0.9 brokers. To test on trunk, you'd have to
> upgrade
> >> >> the
> >> >> brokers as well. Either that or you can rewind to before KAFKA-3306
> >> (which
> >> >> was just committed the day before yesterday)?
> >> >>
> >> >> -Jason
> >> >>
> >> >> On Thu, Apr 28, 2016 at 9:01 AM, vinay sharma <
> >> vinsharma.tech@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > Hi Jason,
> >> >> >
> >> >> > I build kafka-client and tried using it but my producers and
> >> consumers
> >> >> > started throwing below exception. Is 0.10 not going to be
> compatible
> >> >> with
> >> >> > brokers on version 0.9.0.1? or do i need to make some config
> changes
> >> to
> >> >> > producers / consumers to make them compatible with brokers on old
> >> >> version?
> >> >> > or do i need to upgrade brokers to new version as well?
> >> >> >
> >> >> >  org.apache.kafka.common.protocol.types.SchemaException: Error
> >> reading
> >> >> > field 'brokers': Error reading field 'host': Error reading string
> of
> >> >> length
> >> >> > 17995, only 145 bytes available
> >> >> > at
> org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
> >> >> > at
> >> >> >
> >> >> >
> >> >>
> >>
> org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
> >> >> > at
> >> >> >
> >> >> >
> >> >>
> >>
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
> >> >> >
> >> >> > Regards,
> >> >> > Vinay Sharma
> >> >> >
> >> >> > On Thu, Apr 28, 2016 at 12:32 AM, Jason Gustafson <
> >> jason@confluent.io>
> >> >> > wrote:
> >> >> >
> >> >> > > Hey Vinay,
> >> >> > >
> >> >> > > Any chance you can run the same test against trunk? I'm guessing
> >> this
> >> >> > might
> >> >> > > be caused by a bug in the 0.9 consumer which basically causes
> some
> >> >> > requests
> >> >> > > to fail when a bunch of them are sent to the broker at the same
> >> time.
> >> >> > >
> >> >> > > -Jason
> >> >> > >
> >> >> > > On Wed, Apr 27, 2016 at 1:02 PM, vinay sharma <
> >> >> vinsharma.tech@gmail.com>
> >> >> > > wrote:
> >> >> > >
> >> >> > > > Hi Jason,
> >> >> > > >
> >> >> > > > This makes sense.We use 0.9.0.1 and we do have session timeout
> >> set a
> >> >> > bit
> >> >> > > > high but nothing can guarantee that there will be no case when
> >> >> > processing
> >> >> > > > may not go higher than session timeout. I am trying to test a
> >> >> proactive
> >> >> > > > commit approach to handle such cases when processing takes
> >> unusually
> >> >> > long
> >> >> > > > time. To keep consumer's session alive during long processing
> >> time i
> >> >> > > > proactively commitSync processed records every 15 seconds.
> >> Session
> >> >> > > timeout
> >> >> > > > i kept is 30000.
> >> >> > > >
> >> >> > > > *Problem:-*
> >> >> > > > With heart beat interval is 3000 then i expect a hearbeat
> request
> >> >> to be
> >> >> > > > sent on each proactive commit which happens every 15 seconds.
> In
> >> my
> >> >> > > tests i
> >> >> > > > see that this does not happen always. I see a time window which
> >> is
> >> >> > > greater
> >> >> > > > than 30 seconds where no hearbeat is sent even thought there
> were
> >> >> > commits
> >> >> > > > in this duration. After this window i see a couple of
> successful
> >> >> > > heartbeat
> >> >> > > > responses till the end of poll but as soon as i poll again and
> >> call
> >> >> > > > commitSync in next poll i get "ILLEGAL_GENERATION" error. This
> >> error
> >> >> > > always
> >> >> > > > happen just after meta refresh or in next poll processing
> after a
> >> >> meta
> >> >> > > > refresh. I am attaching logs where i kept meta refresh interval
> >> >> 40000,
> >> >> > > > 90000, 500000.
> >> >> > > >
> >> >> > > > *Test results *:-
> >> >> > > > Test with meta refresh 40000 ms ran around 70 seconds from 1st
> >> poll.
> >> >> > > > Test with meta refresh 90000 ms ran around 120 seconds from 1st
> >> >> poll.
> >> >> > > > Test with meta refresh 500000 ms ran around 564 seconds from
> 1st
> >> >> poll.
> >> >> > > >
> >> >> > > > Every test falls in line with above test cases where generation
> >> is
> >> >> > marked
> >> >> > > > dead some time after a meta refresh. Meta refresh before 1st
> poll
> >> >> does
> >> >> > > not
> >> >> > > > create any issue but the ones after poll and during long
> >> processing
> >> >> do.
> >> >> > > >
> >> >> > > > *Environment:-*
> >> >> > > > My setup has 3 brokers 1 zk. Topic has 3 partitions ans has
> >> >> replication
> >> >> > > > factor 3. Messages are already published to topic.
> >> >> > > >
> >> >> > > > *Logic used in test cases :- *
> >> >> > > > On each poll I initialize a map with current committed offset
> >> >> position
> >> >> > of
> >> >> > > > partitions being consumed. I update this map after each record
> >> >> > processing
> >> >> > > > and use this map to proactively commit every 15 seconds. Map is
> >> >> > > initialized
> >> >> > > > again after a proactive commit.
> >> >> > > >
> >> >> > > > I am not sure what is wrong here but i do not see any issue in
> >> code
> >> >> or
> >> >> > > > offset commits going on. Log files and a class with main method
> >> are
> >> >> > > > attached for your reference.
> >> >> > > >
> >> >> > > > Regards,
> >> >> > > > Vinay Sharma
> >> >> > > >
> >> >> > > >
> >> >> > > >
> >> >> > > > On Wed, Apr 27, 2016 at 2:46 PM, Jason Gustafson <
> >> >> jason@confluent.io>
> >> >> > > > wrote:
> >> >> > > >
> >> >> > > >> Hi Vinay,
> >> >> > > >>
> >> >> > > >> Answers below:
> >> >> > > >>
> >> >> > > >> 1)  Is it correct to say that each commitSync will trigger a
> >> >> > > >> HeartBeatTask?
> >> >> > > >> > If there is no hear beat sent in past since specified
> >> heartbeat
> >> >> > > interval
> >> >> > > >> > then i should see a successful heartbeat response or failure
> >> >> message
> >> >> > > in
> >> >> > > >> > logs near to commitSync success log?
> >> >> > > >>
> >> >> > > >>
> >> >> > > >> Not quite. Heartbeats are sent periodically according to the
> >> >> > > >> heartbeat.interval.ms configuration. However, since the
> >> consumer
> >> >> has
> >> >> > no
> >> >> > > >> background thread, they can only be sent in API calls such as
> >> >> poll()
> >> >> > or
> >> >> > > >> commitSync(). So calling commitSync() may or may not result
> in a
> >> >> > > heartbeat
> >> >> > > >> depending only on whether one is "due."
> >> >> > > >>
> >> >> > > >> 2) is it correct to say that Meta Data refresh will not act as
> >> >> > > heartbeat,
> >> >> > > >> > will not trigger heartBeatTask and will not reset
> >> heartBeatTask?
> >> >> > > >>
> >> >> > > >>
> >> >> > > >> That is correct. Metadata refreshes are not related to
> >> heartbeats.
> >> >> > > >>
> >> >> > > >> 3) Where does a consumer session maintained? Lets say my
> >> consumer
> >> >> is
> >> >> > > >> > listening to 3 partitions on a 3 broker cluster where each
> >> >> broker is
> >> >> > > >> leader
> >> >> > > >> > of 1 partition. So will each of the brokers will have a
> >> session
> >> >> for
> >> >> > my
> >> >> > > >> > consumer or is it just 1 session maintained somewhere in
> >> common
> >> >> like
> >> >> > > >> > zookeeper?
> >> >> > > >>
> >> >> > > >>
> >> >> > > >> One of the brokers serves as the "group coordinator." When the
> >> >> > consumer
> >> >> > > >> starts up, it sends a GroupCoordinator request to one of the
> >> >> brokers
> >> >> > to
> >> >> > > >> find out who the coordinator is. Currently, coordinators are
> >> chosen
> >> >> > from
> >> >> > > >> among the leaders of the partitions of the __consumer_offsets
> >> >> topic.
> >> >> > > This
> >> >> > > >> lets us take advantage of the leader election process to also
> >> >> handle
> >> >> > > >> coordinator failures. The coordinator of each group maintains
> >> state
> >> >> > for
> >> >> > > >> the
> >> >> > > >> group and keeps track of session timeouts.
> >> >> > > >>
> >> >> > > >> 4) In above setup, during a long processing if I commit a
> record
> >> >> > through
> >> >> > > >> > commmitSync which triggers a hear beat request and a
> >> successful
> >> >> > > >> response is
> >> >> > > >> > received for the same then what does this response means?
> >> does it
> >> >> > mean
> >> >> > > >> that
> >> >> > > >> > my session with each broker is renewed? or does it mean that
> >> just
> >> >> > the
> >> >> > > >> > leader for partition of committed record knows that my
> >> consumer
> >> >> is
> >> >> > > alive
> >> >> > > >> > and consumer's session on other brokers will still timeout?
> >> >> > > >>
> >> >> > > >>
> >> >> > > >> The coordinator is the only broker that is aware of a
> consumer's
> >> >> > session
> >> >> > > >> and all offset commits are sent to it. Successful heartbeats
> >> mean
> >> >> that
> >> >> > > the
> >> >> > > >> session is still active. Heartbeats are also used to let the
> >> >> consumer
> >> >> > > >> discover when a rebalance has begun. If a new member joins the
> >> >> group,
> >> >> > > then
> >> >> > > >> the coordinator returns an error code in the heartbeat
> >> responses of
> >> >> > the
> >> >> > > >> active members to let them know that they need to rejoin the
> >> group
> >> >> so
> >> >> > > that
> >> >> > > >> partitions can be rebalanced.
> >> >> > > >>
> >> >> > > >> I wouldn't get too hung up on commit/heartbeat behavior. The
> >> crux
> >> >> of
> >> >> > the
> >> >> > > >> issue is that you need to call poll() often enough to avoid
> >> getting
> >> >> > > timed
> >> >> > > >> out by the coordinator. If you find this happening frequently,
> >> you
> >> >> > > >> probably
> >> >> > > >> need to increase session.timeout.ms. There's not really any
> >> >> downside
> >> >> > to
> >> >> > > >> doing so other than that hard failures (in which the consumer
> >> >> can't be
> >> >> > > >> shutdown cleanly) will take a little longer to detect. Normal
> >> >> shutdown
> >> >> > > >> doesn't have this problem. It can be difficult in 0.9 to
> ensure
> >> >> that
> >> >> > > >> poll()
> >> >> > > >> is called often enough since you don't have direct control
> over
> >> the
> >> >> > > amount
> >> >> > > >> of data returned in poll(), but we're adding an option
> >> >> > > (max.poll.records)
> >> >> > > >> in 0.10 which hopefully can be set conservatively enough to
> make
> >> >> this
> >> >> > > >> problem go away.
> >> >> > > >>
> >> >> > > >> -Jason
> >> >> > > >>
> >> >> > > >> On Wed, Apr 27, 2016 at 7:11 AM, vinay sharma <
> >> >> > vinsharma.tech@gmail.com
> >> >> > > >
> >> >> > > >> wrote:
> >> >> > > >>
> >> >> > > >> > Hey,
> >> >> > > >> >
> >> >> > > >> > I am working on a simplified test case to check if there is
> >> any
> >> >> > issue
> >> >> > > >> in my
> >> >> > > >> > code. Just to make sure that any of my assumptions are not
> >> >> wrong, it
> >> >> > > >> will
> >> >> > > >> > be great if you can please help me in finding answers to
> >> >> following
> >> >> > > >> > queries:-
> >> >> > > >> >
> >> >> > > >> > 1)  Is it correct to say that each commitSync will trigger a
> >> >> > > >> HeartBeatTask?
> >> >> > > >> > If there is no hear beat sent in past since specified
> >> heartbeat
> >> >> > > interval
> >> >> > > >> > then i should see a successful heartbeat response or failure
> >> >> message
> >> >> > > in
> >> >> > > >> > logs near to commitSync success log?
> >> >> > > >> > 2) is it correct to say that Meta Data refresh will not act
> as
> >> >> > > >> heartbeat,
> >> >> > > >> > will not trigger heartBeatTask and will not reset
> >> heartBeatTask?
> >> >> > > >> > 3) Where does a consumer session maintained? Lets say my
> >> >> consumer is
> >> >> > > >> > listening to 3 partitions on a 3 broker cluster where each
> >> >> broker is
> >> >> > > >> leader
> >> >> > > >> > of 1 partition. So will each of the brokers will have a
> >> session
> >> >> for
> >> >> > my
> >> >> > > >> > consumer or is it just 1 session maintained somewhere in
> >> common
> >> >> like
> >> >> > > >> > zookeeper?
> >> >> > > >> > 4) In above setup, during a long processing if I commit a
> >> record
> >> >> > > through
> >> >> > > >> > commmitSync which triggers a hear beat request and a
> >> successful
> >> >> > > >> response is
> >> >> > > >> > received for the same then what does this response means?
> >> does it
> >> >> > mean
> >> >> > > >> that
> >> >> > > >> > my session with each broker is renewed? or does it mean that
> >> just
> >> >> > the
> >> >> > > >> > leader for partition of committed record knows that my
> >> consumer
> >> >> is
> >> >> > > alive
> >> >> > > >> > and consumer's session on other brokers will still timeout?
> >> >> > > >> >
> >> >> > > >> > Regards,
> >> >> > > >> > Vinay Sharma
> >> >> > > >> >
> >> >> > > >> > On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson <
> >> >> > jason@confluent.io>
> >> >> > > >> > wrote:
> >> >> > > >> >
> >> >> > > >> > > Hey Vinay,
> >> >> > > >> > >
> >> >> > > >> > > Are you saying that heartbeats are not sent while a
> metadata
> >> >> > refresh
> >> >> > > >> is
> >> >> > > >> > in
> >> >> > > >> > > progress? Do you have any logs which show us the apparent
> >> >> problem?
> >> >> > > >> > >
> >> >> > > >> > > Thanks,
> >> >> > > >> > > Jason
> >> >> > > >> > >
> >> >> > > >> > > On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <
> >> >> > > >> vinsharma.tech@gmail.com>
> >> >> > > >> > > wrote:
> >> >> > > >> > >
> >> >> > > >> > > > Hi Ismael,
> >> >> > > >> > > >
> >> >> > > >> > > > Treating commitSync as heartbeat will definitely resolve
> >> the
> >> >> > issue
> >> >> > > >> i am
> >> >> > > >> > > > facing but the reason behind my issue does not seem to
> be
> >> >> what
> >> >> > > >> > mentioned
> >> >> > > >> > > in
> >> >> > > >> > > > defect (i.e frequent commitSync requests).
> >> >> > > >> > > >
> >> >> > > >> > > > I am sending CommitSync periodically only to keep my
> >> session
> >> >> > alive
> >> >> > > >> when
> >> >> > > >> > > my
> >> >> > > >> > > > consumer is still processing records and is close to
> >> session
> >> >> > time
> >> >> > > >> out
> >> >> > > >> > > > (tried 10th / 12th / 15th / 20th second after poll
> called
> >> >> where
> >> >> > > >> session
> >> >> > > >> > > > time is 30). I see heartbeat response received in logs
> >> along
> >> >> > with
> >> >> > > >> each
> >> >> > > >> > > > commitSync call but this stops after a meta data refresh
> >> >> request
> >> >> > > is
> >> >> > > >> > > issued.
> >> >> > > >> > > > I see in logs that commit goes successful but no
> heartbeat
> >> >> > > response
> >> >> > > >> > > > received message in logs after meta refresh till next
> >> poll.
> >> >> > > >> > > >
> >> >> > > >> > > > Regards,
> >> >> > > >> > > > Vinay Sharma
> >> >> > > >> > > >
> >> >> > > >> > > > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <
> >> >> ismael@juma.me.uk
> >> >> > >
> >> >> > > >> > wrote:
> >> >> > > >> > > >
> >> >> > > >> > > > > Hi Vinay,
> >> >> > > >> > > > >
> >> >> > > >> > > > > This was fixed via
> >> >> > > >> https://issues.apache.org/jira/browse/KAFKA-3470
> >> >> > > >> > > > (will
> >> >> > > >> > > > > be part of 0.10.0.0).
> >> >> > > >> > > > >
> >> >> > > >> > > > > Ismael
> >> >> > > >> > > > >
> >> >> > > >> > > > >
> >> >> > > >> > > > >
> >> >> > > >> > > > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <
> >> >> > > >> > > vinsharma.tech@gmail.com>
> >> >> > > >> > > > > wrote:
> >> >> > > >> > > > >
> >> >> > > >> > > > > > Hello,
> >> >> > > >> > > > > >
> >> >> > > >> > > > > > I am using client API 0.9.0.1 and facing an issue.
> As
> >> >> per my
> >> >> > > >> logs
> >> >> > > >> > it
> >> >> > > >> > > > > seems
> >> >> > > >> > > > > > that on each commitSync(Offsets) a heartbeat request
> >> is
> >> >> sent
> >> >> > > but
> >> >> > > >> > > after
> >> >> > > >> > > > a
> >> >> > > >> > > > > > metada refresh request till next poll(), commits do
> >> not
> >> >> send
> >> >> > > any
> >> >> > > >> > > > hearbeat
> >> >> > > >> > > > > > request.
> >> >> > > >> > > > > >
> >> >> > > >> > > > > > KafkaConsumers i create sometimes get session time
> out
> >> >> due
> >> >> > to
> >> >> > > no
> >> >> > > >> > > > hearbeat
> >> >> > > >> > > > > > specially during longer processing times. I call
> >> >> > > >> > CommitSync(offsets)
> >> >> > > >> > > > > after
> >> >> > > >> > > > > > regular intervals to keep session alive when
> >> processing
> >> >> > takes
> >> >> > > >> > longer
> >> >> > > >> > > > than
> >> >> > > >> > > > > > usual. Every thing works fine if commit intervals
> are
> >> >> very
> >> >> > > >> small or
> >> >> > > >> > > if
> >> >> > > >> > > > i
> >> >> > > >> > > > > > commit after each record but if i commit lets say
> >> every
> >> >> 12
> >> >> > > >> seconds
> >> >> > > >> > > and
> >> >> > > >> > > > 30
> >> >> > > >> > > > > > seconds is session time then i can see consumer
> >> getting
> >> >> > timed
> >> >> > > >> out
> >> >> > > >> > > > > > sometimes.
> >> >> > > >> > > > > >
> >> >> > > >> > > > > > Any help or pointers will be much appreciated.
> Thanks
> >> in
> >> >> > > >> advance.
> >> >> > > >> > > > > >
> >> >> > > >> > > > > > Regards,
> >> >> > > >> > > > > > Vinay sharma
> >> >> > > >> > > > > >
> >> >> > > >> > > > >
> >> >> > > >> > > >
> >> >> > > >> > >
> >> >> > > >> >
> >> >> > > >>
> >> >> > > >
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: No Heartbeat request on commit

Posted by vinay sharma <vi...@gmail.com>.

I was also wondering that if commitSync acts as heartbeat then why do we
still trigger heartbeat request on commit? why not just reset its time on
successful commitSync? or am i wrong and we do this already?

Regards,
Vinay

On Thu, Apr 28, 2016 at 3:38 PM, vinay sharma <vi...@gmail.com>
wrote:

> Hi Jason,
>
> Attached is a simple class with a main method. I used this for reproducing
> issue and generate logs that i attached earlier. This class has code
> snippets of poller relevant to the issue.
>
> Regards,
> Vinay Sharma
>
> On Thu, Apr 28, 2016 at 3:30 PM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
>> Hey Vinay,
>>
>> Thanks, that's really helpful. It does seem like there might be a problem
>> with the heartbeat trigger logic. I'll see if I can reproduce what you're
>> seeing locally. Might be helpful if you share a snippet of your poll loop.
>>
>> Thanks,
>> Jason
>>
>> On Thu, Apr 28, 2016 at 11:55 AM, vinay sharma <vi...@gmail.com>
>> wrote:
>>
>> > Hi Jason,
>> >
>> > i reverted back to KAFKA-3149. Producer still had issues related to
>> schema
>> > but my consumer worked.
>> >
>> > Now consumer worked as expected. Although i did not encountered an error
>> > and generation was not marked dead by coordinator but i still see that
>> > successful heartbeat response are not logged as expected.
>> > My observation is following:-
>> > 1) Meta refresh also triggers heartbeat request. I say this because
>> > sometimes i see 2 heartbeat responses logged just a few milliseconds
>> away
>> > where meta refresh and proactive commit happened almost simultaneously.
>> > 2) I still see that some commitSync requests do not have a heartbeat
>> > logged before or after commit. Although next proactive commit happened
>> just
>> > in time and this time heartbeat request was successful hence saved
>> session.
>> > In attached log you can see that poll was done at 14:17:41, a commit
>> > happened at 14:17:56 and another commit happened at 14:18:14. The only
>> > heart beat response logged during this time is at 14:18:14 which is 29
>> > seconds after poll where as a commit was performed 15 seconds after
>> poll.
>> > Heartbeat interval was 3000.
>> > 3) There are long pauses in heartbeat responses in logs which should
>> cause
>> > session to timeout but its not happening. This implies that commits
>> trigger
>> > a heartbeat but they also act as heartbeat.
>> >
>> >
>> > Regards,
>> > Vinay
>> >
>> >
>> > On Thu, Apr 28, 2016 at 12:29 PM, Jason Gustafson <ja...@confluent.io>
>> > wrote:
>> >
>> >> Ah, yeah. That's probably caused by the new topic metadata version,
>> which
>> >> isn't supported on 0.9 brokers. To test on trunk, you'd have to upgrade
>> >> the
>> >> brokers as well. Either that or you can rewind to before KAFKA-3306
>> (which
>> >> was just committed the day before yesterday)?
>> >>
>> >> -Jason
>> >>
>> >> On Thu, Apr 28, 2016 at 9:01 AM, vinay sharma <
>> vinsharma.tech@gmail.com>
>> >> wrote:
>> >>
>> >> > Hi Jason,
>> >> >
>> >> > I build kafka-client and tried using it but my producers and
>> consumers
>> >> > started throwing below exception. Is 0.10 not going to be compatible
>> >> with
>> >> > brokers on version 0.9.0.1? or do i need to make some config changes
>> to
>> >> > producers / consumers to make them compatible with brokers on old
>> >> version?
>> >> > or do i need to upgrade brokers to new version as well?
>> >> >
>> >> >  org.apache.kafka.common.protocol.types.SchemaException: Error
>> reading
>> >> > field 'brokers': Error reading field 'host': Error reading string of
>> >> length
>> >> > 17995, only 145 bytes available
>> >> > at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
>> >> > at
>> >> >
>> >> >
>> >>
>> org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
>> >> > at
>> >> >
>> >> >
>> >>
>> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
>> >> >
>> >> > Regards,
>> >> > Vinay Sharma
>> >> >
>> >> > On Thu, Apr 28, 2016 at 12:32 AM, Jason Gustafson <
>> jason@confluent.io>
>> >> > wrote:
>> >> >
>> >> > > Hey Vinay,
>> >> > >
>> >> > > Any chance you can run the same test against trunk? I'm guessing
>> this
>> >> > might
>> >> > > be caused by a bug in the 0.9 consumer which basically causes some
>> >> > requests
>> >> > > to fail when a bunch of them are sent to the broker at the same
>> time.
>> >> > >
>> >> > > -Jason
>> >> > >
>> >> > > On Wed, Apr 27, 2016 at 1:02 PM, vinay sharma <
>> >> vinsharma.tech@gmail.com>
>> >> > > wrote:
>> >> > >
>> >> > > > Hi Jason,
>> >> > > >
>> >> > > > This makes sense.We use 0.9.0.1 and we do have session timeout
>> set a
>> >> > bit
>> >> > > > high but nothing can guarantee that there will be no case when
>> >> > processing
>> >> > > > may not go higher than session timeout. I am trying to test a
>> >> proactive
>> >> > > > commit approach to handle such cases when processing takes
>> unusually
>> >> > long
>> >> > > > time. To keep consumer's session alive during long processing
>> time i
>> >> > > > proactively commitSync processed records every 15 seconds.
>> Session
>> >> > > timeout
>> >> > > > i kept is 30000.
>> >> > > >
>> >> > > > *Problem:-*
>> >> > > > With heart beat interval is 3000 then i expect a hearbeat request
>> >> to be
>> >> > > > sent on each proactive commit which happens every 15 seconds. In
>> my
>> >> > > tests i
>> >> > > > see that this does not happen always. I see a time window which
>> is
>> >> > > greater
>> >> > > > than 30 seconds where no hearbeat is sent even thought there were
>> >> > commits
>> >> > > > in this duration. After this window i see a couple of successful
>> >> > > heartbeat
>> >> > > > responses till the end of poll but as soon as i poll again and
>> call
>> >> > > > commitSync in next poll i get "ILLEGAL_GENERATION" error. This
>> error
>> >> > > always
>> >> > > > happen just after meta refresh or in next poll processing after a
>> >> meta
>> >> > > > refresh. I am attaching logs where i kept meta refresh interval
>> >> 40000,
>> >> > > > 90000, 500000.
>> >> > > >
>> >> > > > *Test results *:-
>> >> > > > Test with meta refresh 40000 ms ran around 70 seconds from 1st
>> poll.
>> >> > > > Test with meta refresh 90000 ms ran around 120 seconds from 1st
>> >> poll.
>> >> > > > Test with meta refresh 500000 ms ran around 564 seconds from 1st
>> >> poll.
>> >> > > >
>> >> > > > Every test falls in line with above test cases where generation
>> is
>> >> > marked
>> >> > > > dead some time after a meta refresh. Meta refresh before 1st poll
>> >> does
>> >> > > not
>> >> > > > create any issue but the ones after poll and during long
>> processing
>> >> do.
>> >> > > >
>> >> > > > *Environment:-*
>> >> > > > My setup has 3 brokers 1 zk. Topic has 3 partitions ans has
>> >> replication
>> >> > > > factor 3. Messages are already published to topic.
>> >> > > >
>> >> > > > *Logic used in test cases :- *
>> >> > > > On each poll I initialize a map with current committed offset
>> >> position
>> >> > of
>> >> > > > partitions being consumed. I update this map after each record
>> >> > processing
>> >> > > > and use this map to proactively commit every 15 seconds. Map is
>> >> > > initialized
>> >> > > > again after a proactive commit.
>> >> > > >
>> >> > > > I am not sure what is wrong here but i do not see any issue in
>> code
>> >> or
>> >> > > > offset commits going on. Log files and a class with main method
>> are
>> >> > > > attached for your reference.
>> >> > > >
>> >> > > > Regards,
>> >> > > > Vinay Sharma
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > On Wed, Apr 27, 2016 at 2:46 PM, Jason Gustafson <
>> >> jason@confluent.io>
>> >> > > > wrote:
>> >> > > >
>> >> > > >> Hi Vinay,
>> >> > > >>
>> >> > > >> Answers below:
>> >> > > >>
>> >> > > >> 1)  Is it correct to say that each commitSync will trigger a
>> >> > > >> HeartBeatTask?
>> >> > > >> > If there is no hear beat sent in past since specified
>> heartbeat
>> >> > > interval
>> >> > > >> > then i should see a successful heartbeat response or failure
>> >> message
>> >> > > in
>> >> > > >> > logs near to commitSync success log?
>> >> > > >>
>> >> > > >>
>> >> > > >> Not quite. Heartbeats are sent periodically according to the
>> >> > > >> heartbeat.interval.ms configuration. However, since the
>> consumer
>> >> has
>> >> > no
>> >> > > >> background thread, they can only be sent in API calls such as
>> >> poll()
>> >> > or
>> >> > > >> commitSync(). So calling commitSync() may or may not result in a
>> >> > > heartbeat
>> >> > > >> depending only on whether one is "due."
>> >> > > >>
>> >> > > >> 2) is it correct to say that Meta Data refresh will not act as
>> >> > > heartbeat,
>> >> > > >> > will not trigger heartBeatTask and will not reset
>> heartBeatTask?
>> >> > > >>
>> >> > > >>
>> >> > > >> That is correct. Metadata refreshes are not related to
>> heartbeats.
>> >> > > >>
>> >> > > >> 3) Where does a consumer session maintained? Lets say my
>> consumer
>> >> is
>> >> > > >> > listening to 3 partitions on a 3 broker cluster where each
>> >> broker is
>> >> > > >> leader
>> >> > > >> > of 1 partition. So will each of the brokers will have a
>> session
>> >> for
>> >> > my
>> >> > > >> > consumer or is it just 1 session maintained somewhere in
>> common
>> >> like
>> >> > > >> > zookeeper?
>> >> > > >>
>> >> > > >>
>> >> > > >> One of the brokers serves as the "group coordinator." When the
>> >> > consumer
>> >> > > >> starts up, it sends a GroupCoordinator request to one of the
>> >> brokers
>> >> > to
>> >> > > >> find out who the coordinator is. Currently, coordinators are
>> chosen
>> >> > from
>> >> > > >> among the leaders of the partitions of the __consumer_offsets
>> >> topic.
>> >> > > This
>> >> > > >> lets us take advantage of the leader election process to also
>> >> handle
>> >> > > >> coordinator failures. The coordinator of each group maintains
>> state
>> >> > for
>> >> > > >> the
>> >> > > >> group and keeps track of session timeouts.
>> >> > > >>
>> >> > > >> 4) In above setup, during a long processing if I commit a record
>> >> > through
>> >> > > >> > commmitSync which triggers a hear beat request and a
>> successful
>> >> > > >> response is
>> >> > > >> > received for the same then what does this response means?
>> does it
>> >> > mean
>> >> > > >> that
>> >> > > >> > my session with each broker is renewed? or does it mean that
>> just
>> >> > the
>> >> > > >> > leader for partition of committed record knows that my
>> consumer
>> >> is
>> >> > > alive
>> >> > > >> > and consumer's session on other brokers will still timeout?
>> >> > > >>
>> >> > > >>
>> >> > > >> The coordinator is the only broker that is aware of a consumer's
>> >> > session
>> >> > > >> and all offset commits are sent to it. Successful heartbeats
>> mean
>> >> that
>> >> > > the
>> >> > > >> session is still active. Heartbeats are also used to let the
>> >> consumer
>> >> > > >> discover when a rebalance has begun. If a new member joins the
>> >> group,
>> >> > > then
>> >> > > >> the coordinator returns an error code in the heartbeat
>> responses of
>> >> > the
>> >> > > >> active members to let them know that they need to rejoin the
>> group
>> >> so
>> >> > > that
>> >> > > >> partitions can be rebalanced.
>> >> > > >>
>> >> > > >> I wouldn't get too hung up on commit/heartbeat behavior. The
>> crux
>> >> of
>> >> > the
>> >> > > >> issue is that you need to call poll() often enough to avoid
>> getting
>> >> > > timed
>> >> > > >> out by the coordinator. If you find this happening frequently,
>> you
>> >> > > >> probably
>> >> > > >> need to increase session.timeout.ms. There's not really any
>> >> downside
>> >> > to
>> >> > > >> doing so other than that hard failures (in which the consumer
>> >> can't be
>> >> > > >> shutdown cleanly) will take a little longer to detect. Normal
>> >> shutdown
>> >> > > >> doesn't have this problem. It can be difficult in 0.9 to ensure
>> >> that
>> >> > > >> poll()
>> >> > > >> is called often enough since you don't have direct control over
>> the
>> >> > > amount
>> >> > > >> of data returned in poll(), but we're adding an option
>> >> > > (max.poll.records)
>> >> > > >> in 0.10 which hopefully can be set conservatively enough to make
>> >> this
>> >> > > >> problem go away.
>> >> > > >>
>> >> > > >> -Jason
>> >> > > >>
>> >> > > >> On Wed, Apr 27, 2016 at 7:11 AM, vinay sharma <
>> >> > vinsharma.tech@gmail.com
>> >> > > >
>> >> > > >> wrote:
>> >> > > >>
>> >> > > >> > Hey,
>> >> > > >> >
>> >> > > >> > I am working on a simplified test case to check if there is
>> any
>> >> > issue
>> >> > > >> in my
>> >> > > >> > code. Just to make sure that any of my assumptions are not
>> >> wrong, it
>> >> > > >> will
>> >> > > >> > be great if you can please help me in finding answers to
>> >> following
>> >> > > >> > queries:-
>> >> > > >> >
>> >> > > >> > 1)  Is it correct to say that each commitSync will trigger a
>> >> > > >> HeartBeatTask?
>> >> > > >> > If there is no hear beat sent in past since specified
>> heartbeat
>> >> > > interval
>> >> > > >> > then i should see a successful heartbeat response or failure
>> >> message
>> >> > > in
>> >> > > >> > logs near to commitSync success log?
>> >> > > >> > 2) is it correct to say that Meta Data refresh will not act as
>> >> > > >> heartbeat,
>> >> > > >> > will not trigger heartBeatTask and will not reset
>> heartBeatTask?
>> >> > > >> > 3) Where does a consumer session maintained? Lets say my
>> >> consumer is
>> >> > > >> > listening to 3 partitions on a 3 broker cluster where each
>> >> broker is
>> >> > > >> leader
>> >> > > >> > of 1 partition. So will each of the brokers will have a
>> session
>> >> for
>> >> > my
>> >> > > >> > consumer or is it just 1 session maintained somewhere in
>> common
>> >> like
>> >> > > >> > zookeeper?
>> >> > > >> > 4) In above setup, during a long processing if I commit a
>> record
>> >> > > through
>> >> > > >> > commmitSync which triggers a hear beat request and a
>> successful
>> >> > > >> response is
>> >> > > >> > received for the same then what does this response means?
>> does it
>> >> > mean
>> >> > > >> that
>> >> > > >> > my session with each broker is renewed? or does it mean that
>> just
>> >> > the
>> >> > > >> > leader for partition of committed record knows that my
>> consumer
>> >> is
>> >> > > alive
>> >> > > >> > and consumer's session on other brokers will still timeout?
>> >> > > >> >
>> >> > > >> > Regards,
>> >> > > >> > Vinay Sharma
>> >> > > >> >
>> >> > > >> > On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson <
>> >> > jason@confluent.io>
>> >> > > >> > wrote:
>> >> > > >> >
>> >> > > >> > > Hey Vinay,
>> >> > > >> > >
>> >> > > >> > > Are you saying that heartbeats are not sent while a metadata
>> >> > refresh
>> >> > > >> is
>> >> > > >> > in
>> >> > > >> > > progress? Do you have any logs which show us the apparent
>> >> problem?
>> >> > > >> > >
>> >> > > >> > > Thanks,
>> >> > > >> > > Jason
>> >> > > >> > >
>> >> > > >> > > On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <
>> >> > > >> vinsharma.tech@gmail.com>
>> >> > > >> > > wrote:
>> >> > > >> > >
>> >> > > >> > > > Hi Ismael,
>> >> > > >> > > >
>> >> > > >> > > > Treating commitSync as heartbeat will definitely resolve
>> the
>> >> > issue
>> >> > > >> i am
>> >> > > >> > > > facing but the reason behind my issue does not seem to be
>> >> what
>> >> > > >> > mentioned
>> >> > > >> > > in
>> >> > > >> > > > defect (i.e frequent commitSync requests).
>> >> > > >> > > >
>> >> > > >> > > > I am sending CommitSync periodically only to keep my
>> session
>> >> > alive
>> >> > > >> when
>> >> > > >> > > my
>> >> > > >> > > > consumer is still processing records and is close to
>> session
>> >> > time
>> >> > > >> out
>> >> > > >> > > > (tried 10th / 12th / 15th / 20th second after poll called
>> >> where
>> >> > > >> session
>> >> > > >> > > > time is 30). I see heartbeat response received in logs
>> along
>> >> > with
>> >> > > >> each
>> >> > > >> > > > commitSync call but this stops after a meta data refresh
>> >> request
>> >> > > is
>> >> > > >> > > issued.
>> >> > > >> > > > I see in logs that commit goes successful but no heartbeat
>> >> > > response
>> >> > > >> > > > received message in logs after meta refresh till next
>> poll.
>> >> > > >> > > >
>> >> > > >> > > > Regards,
>> >> > > >> > > > Vinay Sharma
>> >> > > >> > > >
>> >> > > >> > > > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <
>> >> ismael@juma.me.uk
>> >> > >
>> >> > > >> > wrote:
>> >> > > >> > > >
>> >> > > >> > > > > Hi Vinay,
>> >> > > >> > > > >
>> >> > > >> > > > > This was fixed via
>> >> > > >> https://issues.apache.org/jira/browse/KAFKA-3470
>> >> > > >> > > > (will
>> >> > > >> > > > > be part of 0.10.0.0).
>> >> > > >> > > > >
>> >> > > >> > > > > Ismael
>> >> > > >> > > > >
>> >> > > >> > > > >
>> >> > > >> > > > >
>> >> > > >> > > > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <
>> >> > > >> > > vinsharma.tech@gmail.com>
>> >> > > >> > > > > wrote:
>> >> > > >> > > > >
>> >> > > >> > > > > > Hello,
>> >> > > >> > > > > >
>> >> > > >> > > > > > I am using client API 0.9.0.1 and facing an issue. As
>> >> per my
>> >> > > >> logs
>> >> > > >> > it
>> >> > > >> > > > > seems
>> >> > > >> > > > > > that on each commitSync(Offsets) a heartbeat request
>> is
>> >> sent
>> >> > > but
>> >> > > >> > > after
>> >> > > >> > > > a
>> >> > > >> > > > > > metada refresh request till next poll(), commits do
>> not
>> >> send
>> >> > > any
>> >> > > >> > > > hearbeat
>> >> > > >> > > > > > request.
>> >> > > >> > > > > >
>> >> > > >> > > > > > KafkaConsumers i create sometimes get session time out
>> >> due
>> >> > to
>> >> > > no
>> >> > > >> > > > hearbeat
>> >> > > >> > > > > > specially during longer processing times. I call
>> >> > > >> > CommitSync(offsets)
>> >> > > >> > > > > after
>> >> > > >> > > > > > regular intervals to keep session alive when
>> processing
>> >> > takes
>> >> > > >> > longer
>> >> > > >> > > > than
>> >> > > >> > > > > > usual. Every thing works fine if commit intervals are
>> >> very
>> >> > > >> small or
>> >> > > >> > > if
>> >> > > >> > > > i
>> >> > > >> > > > > > commit after each record but if i commit lets say
>> every
>> >> 12
>> >> > > >> seconds
>> >> > > >> > > and
>> >> > > >> > > > 30
>> >> > > >> > > > > > seconds is session time then i can see consumer
>> getting
>> >> > timed
>> >> > > >> out
>> >> > > >> > > > > > sometimes.
>> >> > > >> > > > > >
>> >> > > >> > > > > > Any help or pointers will be much appreciated. Thanks
>> in
>> >> > > >> advance.
>> >> > > >> > > > > >
>> >> > > >> > > > > > Regards,
>> >> > > >> > > > > > Vinay sharma
>> >> > > >> > > > > >
>> >> > > >> > > > >
>> >> > > >> > > >
>> >> > > >> > >
>> >> > > >> >
>> >> > > >>
>> >> > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: No Heartbeat request on commit

Posted by vinay sharma <vi...@gmail.com>.

Hi Jason,

Attached is a simple class with a main method. I used this for reproducing
issue and generate logs that i attached earlier. This class has code
snippets of poller relevant to the issue.

Regards,
Vinay Sharma

On Thu, Apr 28, 2016 at 3:30 PM, Jason Gustafson <ja...@confluent.io> wrote:

> Hey Vinay,
>
> Thanks, that's really helpful. It does seem like there might be a problem
> with the heartbeat trigger logic. I'll see if I can reproduce what you're
> seeing locally. Might be helpful if you share a snippet of your poll loop.
>
> Thanks,
> Jason
>
> On Thu, Apr 28, 2016 at 11:55 AM, vinay sharma <vi...@gmail.com>
> wrote:
>
> > Hi Jason,
> >
> > i reverted back to KAFKA-3149. Producer still had issues related to
> schema
> > but my consumer worked.
> >
> > Now consumer worked as expected. Although i did not encountered an error
> > and generation was not marked dead by coordinator but i still see that
> > successful heartbeat response are not logged as expected.
> > My observation is following:-
> > 1) Meta refresh also triggers heartbeat request. I say this because
> > sometimes i see 2 heartbeat responses logged just a few milliseconds away
> > where meta refresh and proactive commit happened almost simultaneously.
> > 2) I still see that some commitSync requests do not have a heartbeat
> > logged before or after commit. Although next proactive commit happened
> just
> > in time and this time heartbeat request was successful hence saved
> session.
> > In attached log you can see that poll was done at 14:17:41, a commit
> > happened at 14:17:56 and another commit happened at 14:18:14. The only
> > heart beat response logged during this time is at 14:18:14 which is 29
> > seconds after poll where as a commit was performed 15 seconds after poll.
> > Heartbeat interval was 3000.
> > 3) There are long pauses in heartbeat responses in logs which should
> cause
> > session to timeout but its not happening. This implies that commits
> trigger
> > a heartbeat but they also act as heartbeat.
> >
> >
> > Regards,
> > Vinay
> >
> >
> > On Thu, Apr 28, 2016 at 12:29 PM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> >> Ah, yeah. That's probably caused by the new topic metadata version,
> which
> >> isn't supported on 0.9 brokers. To test on trunk, you'd have to upgrade
> >> the
> >> brokers as well. Either that or you can rewind to before KAFKA-3306
> (which
> >> was just committed the day before yesterday)?
> >>
> >> -Jason
> >>
> >> On Thu, Apr 28, 2016 at 9:01 AM, vinay sharma <vinsharma.tech@gmail.com
> >
> >> wrote:
> >>
> >> > Hi Jason,
> >> >
> >> > I build kafka-client and tried using it but my producers and consumers
> >> > started throwing below exception. Is 0.10 not going to be compatible
> >> with
> >> > brokers on version 0.9.0.1? or do i need to make some config changes
> to
> >> > producers / consumers to make them compatible with brokers on old
> >> version?
> >> > or do i need to upgrade brokers to new version as well?
> >> >
> >> >  org.apache.kafka.common.protocol.types.SchemaException: Error reading
> >> > field 'brokers': Error reading field 'host': Error reading string of
> >> length
> >> > 17995, only 145 bytes available
> >> > at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
> >> > at
> >> >
> >> >
> >>
> org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
> >> > at
> >> >
> >> >
> >>
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
> >> >
> >> > Regards,
> >> > Vinay Sharma
> >> >
> >> > On Thu, Apr 28, 2016 at 12:32 AM, Jason Gustafson <jason@confluent.io
> >
> >> > wrote:
> >> >
> >> > > Hey Vinay,
> >> > >
> >> > > Any chance you can run the same test against trunk? I'm guessing
> this
> >> > might
> >> > > be caused by a bug in the 0.9 consumer which basically causes some
> >> > requests
> >> > > to fail when a bunch of them are sent to the broker at the same
> time.
> >> > >
> >> > > -Jason
> >> > >
> >> > > On Wed, Apr 27, 2016 at 1:02 PM, vinay sharma <
> >> vinsharma.tech@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Hi Jason,
> >> > > >
> >> > > > This makes sense.We use 0.9.0.1 and we do have session timeout
> set a
> >> > bit
> >> > > > high but nothing can guarantee that there will be no case when
> >> > processing
> >> > > > may not go higher than session timeout. I am trying to test a
> >> proactive
> >> > > > commit approach to handle such cases when processing takes
> unusually
> >> > long
> >> > > > time. To keep consumer's session alive during long processing
> time i
> >> > > > proactively commitSync processed records every 15 seconds. Session
> >> > > timeout
> >> > > > i kept is 30000.
> >> > > >
> >> > > > *Problem:-*
> >> > > > With heart beat interval is 3000 then i expect a hearbeat request
> >> to be
> >> > > > sent on each proactive commit which happens every 15 seconds. In
> my
> >> > > tests i
> >> > > > see that this does not happen always. I see a time window which is
> >> > > greater
> >> > > > than 30 seconds where no hearbeat is sent even thought there were
> >> > commits
> >> > > > in this duration. After this window i see a couple of successful
> >> > > heartbeat
> >> > > > responses till the end of poll but as soon as i poll again and
> call
> >> > > > commitSync in next poll i get "ILLEGAL_GENERATION" error. This
> error
> >> > > always
> >> > > > happen just after meta refresh or in next poll processing after a
> >> meta
> >> > > > refresh. I am attaching logs where i kept meta refresh interval
> >> 40000,
> >> > > > 90000, 500000.
> >> > > >
> >> > > > *Test results *:-
> >> > > > Test with meta refresh 40000 ms ran around 70 seconds from 1st
> poll.
> >> > > > Test with meta refresh 90000 ms ran around 120 seconds from 1st
> >> poll.
> >> > > > Test with meta refresh 500000 ms ran around 564 seconds from 1st
> >> poll.
> >> > > >
> >> > > > Every test falls in line with above test cases where generation is
> >> > marked
> >> > > > dead some time after a meta refresh. Meta refresh before 1st poll
> >> does
> >> > > not
> >> > > > create any issue but the ones after poll and during long
> processing
> >> do.
> >> > > >
> >> > > > *Environment:-*
> >> > > > My setup has 3 brokers 1 zk. Topic has 3 partitions ans has
> >> replication
> >> > > > factor 3. Messages are already published to topic.
> >> > > >
> >> > > > *Logic used in test cases :- *
> >> > > > On each poll I initialize a map with current committed offset
> >> position
> >> > of
> >> > > > partitions being consumed. I update this map after each record
> >> > processing
> >> > > > and use this map to proactively commit every 15 seconds. Map is
> >> > > initialized
> >> > > > again after a proactive commit.
> >> > > >
> >> > > > I am not sure what is wrong here but i do not see any issue in
> code
> >> or
> >> > > > offset commits going on. Log files and a class with main method
> are
> >> > > > attached for your reference.
> >> > > >
> >> > > > Regards,
> >> > > > Vinay Sharma
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Wed, Apr 27, 2016 at 2:46 PM, Jason Gustafson <
> >> jason@confluent.io>
> >> > > > wrote:
> >> > > >
> >> > > >> Hi Vinay,
> >> > > >>
> >> > > >> Answers below:
> >> > > >>
> >> > > >> 1)  Is it correct to say that each commitSync will trigger a
> >> > > >> HeartBeatTask?
> >> > > >> > If there is no hear beat sent in past since specified heartbeat
> >> > > interval
> >> > > >> > then i should see a successful heartbeat response or failure
> >> message
> >> > > in
> >> > > >> > logs near to commitSync success log?
> >> > > >>
> >> > > >>
> >> > > >> Not quite. Heartbeats are sent periodically according to the
> >> > > >> heartbeat.interval.ms configuration. However, since the consumer
> >> has
> >> > no
> >> > > >> background thread, they can only be sent in API calls such as
> >> poll()
> >> > or
> >> > > >> commitSync(). So calling commitSync() may or may not result in a
> >> > > heartbeat
> >> > > >> depending only on whether one is "due."
> >> > > >>
> >> > > >> 2) is it correct to say that Meta Data refresh will not act as
> >> > > heartbeat,
> >> > > >> > will not trigger heartBeatTask and will not reset
> heartBeatTask?
> >> > > >>
> >> > > >>
> >> > > >> That is correct. Metadata refreshes are not related to
> heartbeats.
> >> > > >>
> >> > > >> 3) Where does a consumer session maintained? Lets say my consumer
> >> is
> >> > > >> > listening to 3 partitions on a 3 broker cluster where each
> >> broker is
> >> > > >> leader
> >> > > >> > of 1 partition. So will each of the brokers will have a session
> >> for
> >> > my
> >> > > >> > consumer or is it just 1 session maintained somewhere in common
> >> like
> >> > > >> > zookeeper?
> >> > > >>
> >> > > >>
> >> > > >> One of the brokers serves as the "group coordinator." When the
> >> > consumer
> >> > > >> starts up, it sends a GroupCoordinator request to one of the
> >> brokers
> >> > to
> >> > > >> find out who the coordinator is. Currently, coordinators are
> chosen
> >> > from
> >> > > >> among the leaders of the partitions of the __consumer_offsets
> >> topic.
> >> > > This
> >> > > >> lets us take advantage of the leader election process to also
> >> handle
> >> > > >> coordinator failures. The coordinator of each group maintains
> state
> >> > for
> >> > > >> the
> >> > > >> group and keeps track of session timeouts.
> >> > > >>
> >> > > >> 4) In above setup, during a long processing if I commit a record
> >> > through
> >> > > >> > commmitSync which triggers a hear beat request and a successful
> >> > > >> response is
> >> > > >> > received for the same then what does this response means? does
> it
> >> > mean
> >> > > >> that
> >> > > >> > my session with each broker is renewed? or does it mean that
> just
> >> > the
> >> > > >> > leader for partition of committed record knows that my consumer
> >> is
> >> > > alive
> >> > > >> > and consumer's session on other brokers will still timeout?
> >> > > >>
> >> > > >>
> >> > > >> The coordinator is the only broker that is aware of a consumer's
> >> > session
> >> > > >> and all offset commits are sent to it. Successful heartbeats mean
> >> that
> >> > > the
> >> > > >> session is still active. Heartbeats are also used to let the
> >> consumer
> >> > > >> discover when a rebalance has begun. If a new member joins the
> >> group,
> >> > > then
> >> > > >> the coordinator returns an error code in the heartbeat responses
> of
> >> > the
> >> > > >> active members to let them know that they need to rejoin the
> group
> >> so
> >> > > that
> >> > > >> partitions can be rebalanced.
> >> > > >>
> >> > > >> I wouldn't get too hung up on commit/heartbeat behavior. The crux
> >> of
> >> > the
> >> > > >> issue is that you need to call poll() often enough to avoid
> getting
> >> > > timed
> >> > > >> out by the coordinator. If you find this happening frequently,
> you
> >> > > >> probably
> >> > > >> need to increase session.timeout.ms. There's not really any
> >> downside
> >> > to
> >> > > >> doing so other than that hard failures (in which the consumer
> >> can't be
> >> > > >> shutdown cleanly) will take a little longer to detect. Normal
> >> shutdown
> >> > > >> doesn't have this problem. It can be difficult in 0.9 to ensure
> >> that
> >> > > >> poll()
> >> > > >> is called often enough since you don't have direct control over
> the
> >> > > amount
> >> > > >> of data returned in poll(), but we're adding an option
> >> > > (max.poll.records)
> >> > > >> in 0.10 which hopefully can be set conservatively enough to make
> >> this
> >> > > >> problem go away.
> >> > > >>
> >> > > >> -Jason
> >> > > >>
> >> > > >> On Wed, Apr 27, 2016 at 7:11 AM, vinay sharma <
> >> > vinsharma.tech@gmail.com
> >> > > >
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Hey,
> >> > > >> >
> >> > > >> > I am working on a simplified test case to check if there is any
> >> > issue
> >> > > >> in my
> >> > > >> > code. Just to make sure that any of my assumptions are not
> >> wrong, it
> >> > > >> will
> >> > > >> > be great if you can please help me in finding answers to
> >> following
> >> > > >> > queries:-
> >> > > >> >
> >> > > >> > 1)  Is it correct to say that each commitSync will trigger a
> >> > > >> HeartBeatTask?
> >> > > >> > If there is no hear beat sent in past since specified heartbeat
> >> > > interval
> >> > > >> > then i should see a successful heartbeat response or failure
> >> message
> >> > > in
> >> > > >> > logs near to commitSync success log?
> >> > > >> > 2) is it correct to say that Meta Data refresh will not act as
> >> > > >> heartbeat,
> >> > > >> > will not trigger heartBeatTask and will not reset
> heartBeatTask?
> >> > > >> > 3) Where does a consumer session maintained? Lets say my
> >> consumer is
> >> > > >> > listening to 3 partitions on a 3 broker cluster where each
> >> broker is
> >> > > >> leader
> >> > > >> > of 1 partition. So will each of the brokers will have a session
> >> for
> >> > my
> >> > > >> > consumer or is it just 1 session maintained somewhere in common
> >> like
> >> > > >> > zookeeper?
> >> > > >> > 4) In above setup, during a long processing if I commit a
> record
> >> > > through
> >> > > >> > commmitSync which triggers a hear beat request and a successful
> >> > > >> response is
> >> > > >> > received for the same then what does this response means? does
> it
> >> > mean
> >> > > >> that
> >> > > >> > my session with each broker is renewed? or does it mean that
> just
> >> > the
> >> > > >> > leader for partition of committed record knows that my consumer
> >> is
> >> > > alive
> >> > > >> > and consumer's session on other brokers will still timeout?
> >> > > >> >
> >> > > >> > Regards,
> >> > > >> > Vinay Sharma
> >> > > >> >
> >> > > >> > On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson <
> >> > jason@confluent.io>
> >> > > >> > wrote:
> >> > > >> >
> >> > > >> > > Hey Vinay,
> >> > > >> > >
> >> > > >> > > Are you saying that heartbeats are not sent while a metadata
> >> > refresh
> >> > > >> is
> >> > > >> > in
> >> > > >> > > progress? Do you have any logs which show us the apparent
> >> problem?
> >> > > >> > >
> >> > > >> > > Thanks,
> >> > > >> > > Jason
> >> > > >> > >
> >> > > >> > > On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <
> >> > > >> vinsharma.tech@gmail.com>
> >> > > >> > > wrote:
> >> > > >> > >
> >> > > >> > > > Hi Ismael,
> >> > > >> > > >
> >> > > >> > > > Treating commitSync as heartbeat will definitely resolve
> the
> >> > issue
> >> > > >> i am
> >> > > >> > > > facing but the reason behind my issue does not seem to be
> >> what
> >> > > >> > mentioned
> >> > > >> > > in
> >> > > >> > > > defect (i.e frequent commitSync requests).
> >> > > >> > > >
> >> > > >> > > > I am sending CommitSync periodically only to keep my
> session
> >> > alive
> >> > > >> when
> >> > > >> > > my
> >> > > >> > > > consumer is still processing records and is close to
> session
> >> > time
> >> > > >> out
> >> > > >> > > > (tried 10th / 12th / 15th / 20th second after poll called
> >> where
> >> > > >> session
> >> > > >> > > > time is 30). I see heartbeat response received in logs
> along
> >> > with
> >> > > >> each
> >> > > >> > > > commitSync call but this stops after a meta data refresh
> >> request
> >> > > is
> >> > > >> > > issued.
> >> > > >> > > > I see in logs that commit goes successful but no heartbeat
> >> > > response
> >> > > >> > > > received message in logs after meta refresh till next poll.
> >> > > >> > > >
> >> > > >> > > > Regards,
> >> > > >> > > > Vinay Sharma
> >> > > >> > > >
> >> > > >> > > > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <
> >> ismael@juma.me.uk
> >> > >
> >> > > >> > wrote:
> >> > > >> > > >
> >> > > >> > > > > Hi Vinay,
> >> > > >> > > > >
> >> > > >> > > > > This was fixed via
> >> > > >> https://issues.apache.org/jira/browse/KAFKA-3470
> >> > > >> > > > (will
> >> > > >> > > > > be part of 0.10.0.0).
> >> > > >> > > > >
> >> > > >> > > > > Ismael
> >> > > >> > > > >
> >> > > >> > > > >
> >> > > >> > > > >
> >> > > >> > > > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <
> >> > > >> > > vinsharma.tech@gmail.com>
> >> > > >> > > > > wrote:
> >> > > >> > > > >
> >> > > >> > > > > > Hello,
> >> > > >> > > > > >
> >> > > >> > > > > > I am using client API 0.9.0.1 and facing an issue. As
> >> per my
> >> > > >> logs
> >> > > >> > it
> >> > > >> > > > > seems
> >> > > >> > > > > > that on each commitSync(Offsets) a heartbeat request is
> >> sent
> >> > > but
> >> > > >> > > after
> >> > > >> > > > a
> >> > > >> > > > > > metada refresh request till next poll(), commits do not
> >> send
> >> > > any
> >> > > >> > > > hearbeat
> >> > > >> > > > > > request.
> >> > > >> > > > > >
> >> > > >> > > > > > KafkaConsumers i create sometimes get session time out
> >> due
> >> > to
> >> > > no
> >> > > >> > > > hearbeat
> >> > > >> > > > > > specially during longer processing times. I call
> >> > > >> > CommitSync(offsets)
> >> > > >> > > > > after
> >> > > >> > > > > > regular intervals to keep session alive when processing
> >> > takes
> >> > > >> > longer
> >> > > >> > > > than
> >> > > >> > > > > > usual. Every thing works fine if commit intervals are
> >> very
> >> > > >> small or
> >> > > >> > > if
> >> > > >> > > > i
> >> > > >> > > > > > commit after each record but if i commit lets say every
> >> 12
> >> > > >> seconds
> >> > > >> > > and
> >> > > >> > > > 30
> >> > > >> > > > > > seconds is session time then i can see consumer getting
> >> > timed
> >> > > >> out
> >> > > >> > > > > > sometimes.
> >> > > >> > > > > >
> >> > > >> > > > > > Any help or pointers will be much appreciated. Thanks
> in
> >> > > >> advance.
> >> > > >> > > > > >
> >> > > >> > > > > > Regards,
> >> > > >> > > > > > Vinay sharma
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: No Heartbeat request on commit

Posted by Jason Gustafson <ja...@confluent.io>.

Hey Vinay,

Thanks, that's really helpful. It does seem like there might be a problem
with the heartbeat trigger logic. I'll see if I can reproduce what you're
seeing locally. Might be helpful if you share a snippet of your poll loop.

Thanks,
Jason

On Thu, Apr 28, 2016 at 11:55 AM, vinay sharma <vi...@gmail.com>
wrote:

> Hi Jason,
>
> i reverted back to KAFKA-3149. Producer still had issues related to schema
> but my consumer worked.
>
> Now consumer worked as expected. Although i did not encountered an error
> and generation was not marked dead by coordinator but i still see that
> successful heartbeat response are not logged as expected.
> My observation is following:-
> 1) Meta refresh also triggers heartbeat request. I say this because
> sometimes i see 2 heartbeat responses logged just a few milliseconds away
> where meta refresh and proactive commit happened almost simultaneously.
> 2) I still see that some commitSync requests do not have a heartbeat
> logged before or after commit. Although next proactive commit happened just
> in time and this time heartbeat request was successful hence saved session.
> In attached log you can see that poll was done at 14:17:41, a commit
> happened at 14:17:56 and another commit happened at 14:18:14. The only
> heart beat response logged during this time is at 14:18:14 which is 29
> seconds after poll where as a commit was performed 15 seconds after poll.
> Heartbeat interval was 3000.
> 3) There are long pauses in heartbeat responses in logs which should cause
> session to timeout but its not happening. This implies that commits trigger
> a heartbeat but they also act as heartbeat.
>
>
> Regards,
> Vinay
>
>
> On Thu, Apr 28, 2016 at 12:29 PM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
>> Ah, yeah. That's probably caused by the new topic metadata version, which
>> isn't supported on 0.9 brokers. To test on trunk, you'd have to upgrade
>> the
>> brokers as well. Either that or you can rewind to before KAFKA-3306 (which
>> was just committed the day before yesterday)?
>>
>> -Jason
>>
>> On Thu, Apr 28, 2016 at 9:01 AM, vinay sharma <vi...@gmail.com>
>> wrote:
>>
>> > Hi Jason,
>> >
>> > I build kafka-client and tried using it but my producers and consumers
>> > started throwing below exception. Is 0.10 not going to be compatible
>> with
>> > brokers on version 0.9.0.1? or do i need to make some config changes to
>> > producers / consumers to make them compatible with brokers on old
>> version?
>> > or do i need to upgrade brokers to new version as well?
>> >
>> >  org.apache.kafka.common.protocol.types.SchemaException: Error reading
>> > field 'brokers': Error reading field 'host': Error reading string of
>> length
>> > 17995, only 145 bytes available
>> > at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
>> > at
>> >
>> >
>> org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
>> > at
>> >
>> >
>> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
>> >
>> > Regards,
>> > Vinay Sharma
>> >
>> > On Thu, Apr 28, 2016 at 12:32 AM, Jason Gustafson <ja...@confluent.io>
>> > wrote:
>> >
>> > > Hey Vinay,
>> > >
>> > > Any chance you can run the same test against trunk? I'm guessing this
>> > might
>> > > be caused by a bug in the 0.9 consumer which basically causes some
>> > requests
>> > > to fail when a bunch of them are sent to the broker at the same time.
>> > >
>> > > -Jason
>> > >
>> > > On Wed, Apr 27, 2016 at 1:02 PM, vinay sharma <
>> vinsharma.tech@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Jason,
>> > > >
>> > > > This makes sense.We use 0.9.0.1 and we do have session timeout set a
>> > bit
>> > > > high but nothing can guarantee that there will be no case when
>> > processing
>> > > > may not go higher than session timeout. I am trying to test a
>> proactive
>> > > > commit approach to handle such cases when processing takes unusually
>> > long
>> > > > time. To keep consumer's session alive during long processing time i
>> > > > proactively commitSync processed records every 15 seconds. Session
>> > > timeout
>> > > > i kept is 30000.
>> > > >
>> > > > *Problem:-*
>> > > > With heart beat interval is 3000 then i expect a hearbeat request
>> to be
>> > > > sent on each proactive commit which happens every 15 seconds. In my
>> > > tests i
>> > > > see that this does not happen always. I see a time window which is
>> > > greater
>> > > > than 30 seconds where no hearbeat is sent even thought there were
>> > commits
>> > > > in this duration. After this window i see a couple of successful
>> > > heartbeat
>> > > > responses till the end of poll but as soon as i poll again and call
>> > > > commitSync in next poll i get "ILLEGAL_GENERATION" error. This error
>> > > always
>> > > > happen just after meta refresh or in next poll processing after a
>> meta
>> > > > refresh. I am attaching logs where i kept meta refresh interval
>> 40000,
>> > > > 90000, 500000.
>> > > >
>> > > > *Test results *:-
>> > > > Test with meta refresh 40000 ms ran around 70 seconds from 1st poll.
>> > > > Test with meta refresh 90000 ms ran around 120 seconds from 1st
>> poll.
>> > > > Test with meta refresh 500000 ms ran around 564 seconds from 1st
>> poll.
>> > > >
>> > > > Every test falls in line with above test cases where generation is
>> > marked
>> > > > dead some time after a meta refresh. Meta refresh before 1st poll
>> does
>> > > not
>> > > > create any issue but the ones after poll and during long processing
>> do.
>> > > >
>> > > > *Environment:-*
>> > > > My setup has 3 brokers 1 zk. Topic has 3 partitions ans has
>> replication
>> > > > factor 3. Messages are already published to topic.
>> > > >
>> > > > *Logic used in test cases :- *
>> > > > On each poll I initialize a map with current committed offset
>> position
>> > of
>> > > > partitions being consumed. I update this map after each record
>> > processing
>> > > > and use this map to proactively commit every 15 seconds. Map is
>> > > initialized
>> > > > again after a proactive commit.
>> > > >
>> > > > I am not sure what is wrong here but i do not see any issue in code
>> or
>> > > > offset commits going on. Log files and a class with main method are
>> > > > attached for your reference.
>> > > >
>> > > > Regards,
>> > > > Vinay Sharma
>> > > >
>> > > >
>> > > >
>> > > > On Wed, Apr 27, 2016 at 2:46 PM, Jason Gustafson <
>> jason@confluent.io>
>> > > > wrote:
>> > > >
>> > > >> Hi Vinay,
>> > > >>
>> > > >> Answers below:
>> > > >>
>> > > >> 1)  Is it correct to say that each commitSync will trigger a
>> > > >> HeartBeatTask?
>> > > >> > If there is no hear beat sent in past since specified heartbeat
>> > > interval
>> > > >> > then i should see a successful heartbeat response or failure
>> message
>> > > in
>> > > >> > logs near to commitSync success log?
>> > > >>
>> > > >>
>> > > >> Not quite. Heartbeats are sent periodically according to the
>> > > >> heartbeat.interval.ms configuration. However, since the consumer
>> has
>> > no
>> > > >> background thread, they can only be sent in API calls such as
>> poll()
>> > or
>> > > >> commitSync(). So calling commitSync() may or may not result in a
>> > > heartbeat
>> > > >> depending only on whether one is "due."
>> > > >>
>> > > >> 2) is it correct to say that Meta Data refresh will not act as
>> > > heartbeat,
>> > > >> > will not trigger heartBeatTask and will not reset heartBeatTask?
>> > > >>
>> > > >>
>> > > >> That is correct. Metadata refreshes are not related to heartbeats.
>> > > >>
>> > > >> 3) Where does a consumer session maintained? Lets say my consumer
>> is
>> > > >> > listening to 3 partitions on a 3 broker cluster where each
>> broker is
>> > > >> leader
>> > > >> > of 1 partition. So will each of the brokers will have a session
>> for
>> > my
>> > > >> > consumer or is it just 1 session maintained somewhere in common
>> like
>> > > >> > zookeeper?
>> > > >>
>> > > >>
>> > > >> One of the brokers serves as the "group coordinator." When the
>> > consumer
>> > > >> starts up, it sends a GroupCoordinator request to one of the
>> brokers
>> > to
>> > > >> find out who the coordinator is. Currently, coordinators are chosen
>> > from
>> > > >> among the leaders of the partitions of the __consumer_offsets
>> topic.
>> > > This
>> > > >> lets us take advantage of the leader election process to also
>> handle
>> > > >> coordinator failures. The coordinator of each group maintains state
>> > for
>> > > >> the
>> > > >> group and keeps track of session timeouts.
>> > > >>
>> > > >> 4) In above setup, during a long processing if I commit a record
>> > through
>> > > >> > commmitSync which triggers a hear beat request and a successful
>> > > >> response is
>> > > >> > received for the same then what does this response means? does it
>> > mean
>> > > >> that
>> > > >> > my session with each broker is renewed? or does it mean that just
>> > the
>> > > >> > leader for partition of committed record knows that my consumer
>> is
>> > > alive
>> > > >> > and consumer's session on other brokers will still timeout?
>> > > >>
>> > > >>
>> > > >> The coordinator is the only broker that is aware of a consumer's
>> > session
>> > > >> and all offset commits are sent to it. Successful heartbeats mean
>> that
>> > > the
>> > > >> session is still active. Heartbeats are also used to let the
>> consumer
>> > > >> discover when a rebalance has begun. If a new member joins the
>> group,
>> > > then
>> > > >> the coordinator returns an error code in the heartbeat responses of
>> > the
>> > > >> active members to let them know that they need to rejoin the group
>> so
>> > > that
>> > > >> partitions can be rebalanced.
>> > > >>
>> > > >> I wouldn't get too hung up on commit/heartbeat behavior. The crux
>> of
>> > the
>> > > >> issue is that you need to call poll() often enough to avoid getting
>> > > timed
>> > > >> out by the coordinator. If you find this happening frequently, you
>> > > >> probably
>> > > >> need to increase session.timeout.ms. There's not really any
>> downside
>> > to
>> > > >> doing so other than that hard failures (in which the consumer
>> can't be
>> > > >> shutdown cleanly) will take a little longer to detect. Normal
>> shutdown
>> > > >> doesn't have this problem. It can be difficult in 0.9 to ensure
>> that
>> > > >> poll()
>> > > >> is called often enough since you don't have direct control over the
>> > > amount
>> > > >> of data returned in poll(), but we're adding an option
>> > > (max.poll.records)
>> > > >> in 0.10 which hopefully can be set conservatively enough to make
>> this
>> > > >> problem go away.
>> > > >>
>> > > >> -Jason
>> > > >>
>> > > >> On Wed, Apr 27, 2016 at 7:11 AM, vinay sharma <
>> > vinsharma.tech@gmail.com
>> > > >
>> > > >> wrote:
>> > > >>
>> > > >> > Hey,
>> > > >> >
>> > > >> > I am working on a simplified test case to check if there is any
>> > issue
>> > > >> in my
>> > > >> > code. Just to make sure that any of my assumptions are not
>> wrong, it
>> > > >> will
>> > > >> > be great if you can please help me in finding answers to
>> following
>> > > >> > queries:-
>> > > >> >
>> > > >> > 1)  Is it correct to say that each commitSync will trigger a
>> > > >> HeartBeatTask?
>> > > >> > If there is no hear beat sent in past since specified heartbeat
>> > > interval
>> > > >> > then i should see a successful heartbeat response or failure
>> message
>> > > in
>> > > >> > logs near to commitSync success log?
>> > > >> > 2) is it correct to say that Meta Data refresh will not act as
>> > > >> heartbeat,
>> > > >> > will not trigger heartBeatTask and will not reset heartBeatTask?
>> > > >> > 3) Where does a consumer session maintained? Lets say my
>> consumer is
>> > > >> > listening to 3 partitions on a 3 broker cluster where each
>> broker is
>> > > >> leader
>> > > >> > of 1 partition. So will each of the brokers will have a session
>> for
>> > my
>> > > >> > consumer or is it just 1 session maintained somewhere in common
>> like
>> > > >> > zookeeper?
>> > > >> > 4) In above setup, during a long processing if I commit a record
>> > > through
>> > > >> > commmitSync which triggers a hear beat request and a successful
>> > > >> response is
>> > > >> > received for the same then what does this response means? does it
>> > mean
>> > > >> that
>> > > >> > my session with each broker is renewed? or does it mean that just
>> > the
>> > > >> > leader for partition of committed record knows that my consumer
>> is
>> > > alive
>> > > >> > and consumer's session on other brokers will still timeout?
>> > > >> >
>> > > >> > Regards,
>> > > >> > Vinay Sharma
>> > > >> >
>> > > >> > On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson <
>> > jason@confluent.io>
>> > > >> > wrote:
>> > > >> >
>> > > >> > > Hey Vinay,
>> > > >> > >
>> > > >> > > Are you saying that heartbeats are not sent while a metadata
>> > refresh
>> > > >> is
>> > > >> > in
>> > > >> > > progress? Do you have any logs which show us the apparent
>> problem?
>> > > >> > >
>> > > >> > > Thanks,
>> > > >> > > Jason
>> > > >> > >
>> > > >> > > On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <
>> > > >> vinsharma.tech@gmail.com>
>> > > >> > > wrote:
>> > > >> > >
>> > > >> > > > Hi Ismael,
>> > > >> > > >
>> > > >> > > > Treating commitSync as heartbeat will definitely resolve the
>> > issue
>> > > >> i am
>> > > >> > > > facing but the reason behind my issue does not seem to be
>> what
>> > > >> > mentioned
>> > > >> > > in
>> > > >> > > > defect (i.e frequent commitSync requests).
>> > > >> > > >
>> > > >> > > > I am sending CommitSync periodically only to keep my session
>> > alive
>> > > >> when
>> > > >> > > my
>> > > >> > > > consumer is still processing records and is close to session
>> > time
>> > > >> out
>> > > >> > > > (tried 10th / 12th / 15th / 20th second after poll called
>> where
>> > > >> session
>> > > >> > > > time is 30). I see heartbeat response received in logs along
>> > with
>> > > >> each
>> > > >> > > > commitSync call but this stops after a meta data refresh
>> request
>> > > is
>> > > >> > > issued.
>> > > >> > > > I see in logs that commit goes successful but no heartbeat
>> > > response
>> > > >> > > > received message in logs after meta refresh till next poll.
>> > > >> > > >
>> > > >> > > > Regards,
>> > > >> > > > Vinay Sharma
>> > > >> > > >
>> > > >> > > > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <
>> ismael@juma.me.uk
>> > >
>> > > >> > wrote:
>> > > >> > > >
>> > > >> > > > > Hi Vinay,
>> > > >> > > > >
>> > > >> > > > > This was fixed via
>> > > >> https://issues.apache.org/jira/browse/KAFKA-3470
>> > > >> > > > (will
>> > > >> > > > > be part of 0.10.0.0).
>> > > >> > > > >
>> > > >> > > > > Ismael
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <
>> > > >> > > vinsharma.tech@gmail.com>
>> > > >> > > > > wrote:
>> > > >> > > > >
>> > > >> > > > > > Hello,
>> > > >> > > > > >
>> > > >> > > > > > I am using client API 0.9.0.1 and facing an issue. As
>> per my
>> > > >> logs
>> > > >> > it
>> > > >> > > > > seems
>> > > >> > > > > > that on each commitSync(Offsets) a heartbeat request is
>> sent
>> > > but
>> > > >> > > after
>> > > >> > > > a
>> > > >> > > > > > metada refresh request till next poll(), commits do not
>> send
>> > > any
>> > > >> > > > hearbeat
>> > > >> > > > > > request.
>> > > >> > > > > >
>> > > >> > > > > > KafkaConsumers i create sometimes get session time out
>> due
>> > to
>> > > no
>> > > >> > > > hearbeat
>> > > >> > > > > > specially during longer processing times. I call
>> > > >> > CommitSync(offsets)
>> > > >> > > > > after
>> > > >> > > > > > regular intervals to keep session alive when processing
>> > takes
>> > > >> > longer
>> > > >> > > > than
>> > > >> > > > > > usual. Every thing works fine if commit intervals are
>> very
>> > > >> small or
>> > > >> > > if
>> > > >> > > > i
>> > > >> > > > > > commit after each record but if i commit lets say every
>> 12
>> > > >> seconds
>> > > >> > > and
>> > > >> > > > 30
>> > > >> > > > > > seconds is session time then i can see consumer getting
>> > timed
>> > > >> out
>> > > >> > > > > > sometimes.
>> > > >> > > > > >
>> > > >> > > > > > Any help or pointers will be much appreciated. Thanks in
>> > > >> advance.
>> > > >> > > > > >
>> > > >> > > > > > Regards,
>> > > >> > > > > > Vinay sharma
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > > >
>> > > >
>> > >
>> >
>>
>
>

Re: No Heartbeat request on commit

Posted by vinay sharma <vi...@gmail.com>.

Hi Jason,

i reverted back to KAFKA-3149. Producer still had issues related to schema
but my consumer worked.

Now consumer worked as expected. Although i did not encountered an error
and generation was not marked dead by coordinator but i still see that
successful heartbeat response are not logged as expected.
My observation is following:-
1) Meta refresh also triggers heartbeat request. I say this because
sometimes i see 2 heartbeat responses logged just a few milliseconds away
where meta refresh and proactive commit happened almost simultaneously.
2) I still see that some commitSync requests do not have a heartbeat logged
before or after commit. Although next proactive commit happened just in
time and this time heartbeat request was successful hence saved session. In
attached log you can see that poll was done at 14:17:41, a commit happened
at 14:17:56 and another commit happened at 14:18:14. The only heart beat
response logged during this time is at 14:18:14 which is 29 seconds after
poll where as a commit was performed 15 seconds after poll. Heartbeat
interval was 3000.
3) There are long pauses in heartbeat responses in logs which should cause
session to timeout but its not happening. This implies that commits trigger
a heartbeat but they also act as heartbeat.


Regards,
Vinay


On Thu, Apr 28, 2016 at 12:29 PM, Jason Gustafson <ja...@confluent.io>
wrote:

> Ah, yeah. That's probably caused by the new topic metadata version, which
> isn't supported on 0.9 brokers. To test on trunk, you'd have to upgrade the
> brokers as well. Either that or you can rewind to before KAFKA-3306 (which
> was just committed the day before yesterday)?
>
> -Jason
>
> On Thu, Apr 28, 2016 at 9:01 AM, vinay sharma <vi...@gmail.com>
> wrote:
>
> > Hi Jason,
> >
> > I build kafka-client and tried using it but my producers and consumers
> > started throwing below exception. Is 0.10 not going to be compatible with
> > brokers on version 0.9.0.1? or do i need to make some config changes to
> > producers / consumers to make them compatible with brokers on old
> version?
> > or do i need to upgrade brokers to new version as well?
> >
> >  org.apache.kafka.common.protocol.types.SchemaException: Error reading
> > field 'brokers': Error reading field 'host': Error reading string of
> length
> > 17995, only 145 bytes available
> > at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
> > at
> >
> >
> org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
> > at
> >
> >
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
> >
> > Regards,
> > Vinay Sharma
> >
> > On Thu, Apr 28, 2016 at 12:32 AM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hey Vinay,
> > >
> > > Any chance you can run the same test against trunk? I'm guessing this
> > might
> > > be caused by a bug in the 0.9 consumer which basically causes some
> > requests
> > > to fail when a bunch of them are sent to the broker at the same time.
> > >
> > > -Jason
> > >
> > > On Wed, Apr 27, 2016 at 1:02 PM, vinay sharma <
> vinsharma.tech@gmail.com>
> > > wrote:
> > >
> > > > Hi Jason,
> > > >
> > > > This makes sense.We use 0.9.0.1 and we do have session timeout set a
> > bit
> > > > high but nothing can guarantee that there will be no case when
> > processing
> > > > may not go higher than session timeout. I am trying to test a
> proactive
> > > > commit approach to handle such cases when processing takes unusually
> > long
> > > > time. To keep consumer's session alive during long processing time i
> > > > proactively commitSync processed records every 15 seconds. Session
> > > timeout
> > > > i kept is 30000.
> > > >
> > > > *Problem:-*
> > > > With heart beat interval is 3000 then i expect a hearbeat request to
> be
> > > > sent on each proactive commit which happens every 15 seconds. In my
> > > tests i
> > > > see that this does not happen always. I see a time window which is
> > > greater
> > > > than 30 seconds where no hearbeat is sent even thought there were
> > commits
> > > > in this duration. After this window i see a couple of successful
> > > heartbeat
> > > > responses till the end of poll but as soon as i poll again and call
> > > > commitSync in next poll i get "ILLEGAL_GENERATION" error. This error
> > > always
> > > > happen just after meta refresh or in next poll processing after a
> meta
> > > > refresh. I am attaching logs where i kept meta refresh interval
> 40000,
> > > > 90000, 500000.
> > > >
> > > > *Test results *:-
> > > > Test with meta refresh 40000 ms ran around 70 seconds from 1st poll.
> > > > Test with meta refresh 90000 ms ran around 120 seconds from 1st poll.
> > > > Test with meta refresh 500000 ms ran around 564 seconds from 1st
> poll.
> > > >
> > > > Every test falls in line with above test cases where generation is
> > marked
> > > > dead some time after a meta refresh. Meta refresh before 1st poll
> does
> > > not
> > > > create any issue but the ones after poll and during long processing
> do.
> > > >
> > > > *Environment:-*
> > > > My setup has 3 brokers 1 zk. Topic has 3 partitions ans has
> replication
> > > > factor 3. Messages are already published to topic.
> > > >
> > > > *Logic used in test cases :- *
> > > > On each poll I initialize a map with current committed offset
> position
> > of
> > > > partitions being consumed. I update this map after each record
> > processing
> > > > and use this map to proactively commit every 15 seconds. Map is
> > > initialized
> > > > again after a proactive commit.
> > > >
> > > > I am not sure what is wrong here but i do not see any issue in code
> or
> > > > offset commits going on. Log files and a class with main method are
> > > > attached for your reference.
> > > >
> > > > Regards,
> > > > Vinay Sharma
> > > >
> > > >
> > > >
> > > > On Wed, Apr 27, 2016 at 2:46 PM, Jason Gustafson <jason@confluent.io
> >
> > > > wrote:
> > > >
> > > >> Hi Vinay,
> > > >>
> > > >> Answers below:
> > > >>
> > > >> 1)  Is it correct to say that each commitSync will trigger a
> > > >> HeartBeatTask?
> > > >> > If there is no hear beat sent in past since specified heartbeat
> > > interval
> > > >> > then i should see a successful heartbeat response or failure
> message
> > > in
> > > >> > logs near to commitSync success log?
> > > >>
> > > >>
> > > >> Not quite. Heartbeats are sent periodically according to the
> > > >> heartbeat.interval.ms configuration. However, since the consumer
> has
> > no
> > > >> background thread, they can only be sent in API calls such as poll()
> > or
> > > >> commitSync(). So calling commitSync() may or may not result in a
> > > heartbeat
> > > >> depending only on whether one is "due."
> > > >>
> > > >> 2) is it correct to say that Meta Data refresh will not act as
> > > heartbeat,
> > > >> > will not trigger heartBeatTask and will not reset heartBeatTask?
> > > >>
> > > >>
> > > >> That is correct. Metadata refreshes are not related to heartbeats.
> > > >>
> > > >> 3) Where does a consumer session maintained? Lets say my consumer is
> > > >> > listening to 3 partitions on a 3 broker cluster where each broker
> is
> > > >> leader
> > > >> > of 1 partition. So will each of the brokers will have a session
> for
> > my
> > > >> > consumer or is it just 1 session maintained somewhere in common
> like
> > > >> > zookeeper?
> > > >>
> > > >>
> > > >> One of the brokers serves as the "group coordinator." When the
> > consumer
> > > >> starts up, it sends a GroupCoordinator request to one of the brokers
> > to
> > > >> find out who the coordinator is. Currently, coordinators are chosen
> > from
> > > >> among the leaders of the partitions of the __consumer_offsets topic.
> > > This
> > > >> lets us take advantage of the leader election process to also handle
> > > >> coordinator failures. The coordinator of each group maintains state
> > for
> > > >> the
> > > >> group and keeps track of session timeouts.
> > > >>
> > > >> 4) In above setup, during a long processing if I commit a record
> > through
> > > >> > commmitSync which triggers a hear beat request and a successful
> > > >> response is
> > > >> > received for the same then what does this response means? does it
> > mean
> > > >> that
> > > >> > my session with each broker is renewed? or does it mean that just
> > the
> > > >> > leader for partition of committed record knows that my consumer is
> > > alive
> > > >> > and consumer's session on other brokers will still timeout?
> > > >>
> > > >>
> > > >> The coordinator is the only broker that is aware of a consumer's
> > session
> > > >> and all offset commits are sent to it. Successful heartbeats mean
> that
> > > the
> > > >> session is still active. Heartbeats are also used to let the
> consumer
> > > >> discover when a rebalance has begun. If a new member joins the
> group,
> > > then
> > > >> the coordinator returns an error code in the heartbeat responses of
> > the
> > > >> active members to let them know that they need to rejoin the group
> so
> > > that
> > > >> partitions can be rebalanced.
> > > >>
> > > >> I wouldn't get too hung up on commit/heartbeat behavior. The crux of
> > the
> > > >> issue is that you need to call poll() often enough to avoid getting
> > > timed
> > > >> out by the coordinator. If you find this happening frequently, you
> > > >> probably
> > > >> need to increase session.timeout.ms. There's not really any
> downside
> > to
> > > >> doing so other than that hard failures (in which the consumer can't
> be
> > > >> shutdown cleanly) will take a little longer to detect. Normal
> shutdown
> > > >> doesn't have this problem. It can be difficult in 0.9 to ensure that
> > > >> poll()
> > > >> is called often enough since you don't have direct control over the
> > > amount
> > > >> of data returned in poll(), but we're adding an option
> > > (max.poll.records)
> > > >> in 0.10 which hopefully can be set conservatively enough to make
> this
> > > >> problem go away.
> > > >>
> > > >> -Jason
> > > >>
> > > >> On Wed, Apr 27, 2016 at 7:11 AM, vinay sharma <
> > vinsharma.tech@gmail.com
> > > >
> > > >> wrote:
> > > >>
> > > >> > Hey,
> > > >> >
> > > >> > I am working on a simplified test case to check if there is any
> > issue
> > > >> in my
> > > >> > code. Just to make sure that any of my assumptions are not wrong,
> it
> > > >> will
> > > >> > be great if you can please help me in finding answers to following
> > > >> > queries:-
> > > >> >
> > > >> > 1)  Is it correct to say that each commitSync will trigger a
> > > >> HeartBeatTask?
> > > >> > If there is no hear beat sent in past since specified heartbeat
> > > interval
> > > >> > then i should see a successful heartbeat response or failure
> message
> > > in
> > > >> > logs near to commitSync success log?
> > > >> > 2) is it correct to say that Meta Data refresh will not act as
> > > >> heartbeat,
> > > >> > will not trigger heartBeatTask and will not reset heartBeatTask?
> > > >> > 3) Where does a consumer session maintained? Lets say my consumer
> is
> > > >> > listening to 3 partitions on a 3 broker cluster where each broker
> is
> > > >> leader
> > > >> > of 1 partition. So will each of the brokers will have a session
> for
> > my
> > > >> > consumer or is it just 1 session maintained somewhere in common
> like
> > > >> > zookeeper?
> > > >> > 4) In above setup, during a long processing if I commit a record
> > > through
> > > >> > commmitSync which triggers a hear beat request and a successful
> > > >> response is
> > > >> > received for the same then what does this response means? does it
> > mean
> > > >> that
> > > >> > my session with each broker is renewed? or does it mean that just
> > the
> > > >> > leader for partition of committed record knows that my consumer is
> > > alive
> > > >> > and consumer's session on other brokers will still timeout?
> > > >> >
> > > >> > Regards,
> > > >> > Vinay Sharma
> > > >> >
> > > >> > On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson <
> > jason@confluent.io>
> > > >> > wrote:
> > > >> >
> > > >> > > Hey Vinay,
> > > >> > >
> > > >> > > Are you saying that heartbeats are not sent while a metadata
> > refresh
> > > >> is
> > > >> > in
> > > >> > > progress? Do you have any logs which show us the apparent
> problem?
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Jason
> > > >> > >
> > > >> > > On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <
> > > >> vinsharma.tech@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi Ismael,
> > > >> > > >
> > > >> > > > Treating commitSync as heartbeat will definitely resolve the
> > issue
> > > >> i am
> > > >> > > > facing but the reason behind my issue does not seem to be what
> > > >> > mentioned
> > > >> > > in
> > > >> > > > defect (i.e frequent commitSync requests).
> > > >> > > >
> > > >> > > > I am sending CommitSync periodically only to keep my session
> > alive
> > > >> when
> > > >> > > my
> > > >> > > > consumer is still processing records and is close to session
> > time
> > > >> out
> > > >> > > > (tried 10th / 12th / 15th / 20th second after poll called
> where
> > > >> session
> > > >> > > > time is 30). I see heartbeat response received in logs along
> > with
> > > >> each
> > > >> > > > commitSync call but this stops after a meta data refresh
> request
> > > is
> > > >> > > issued.
> > > >> > > > I see in logs that commit goes successful but no heartbeat
> > > response
> > > >> > > > received message in logs after meta refresh till next poll.
> > > >> > > >
> > > >> > > > Regards,
> > > >> > > > Vinay Sharma
> > > >> > > >
> > > >> > > > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <
> ismael@juma.me.uk
> > >
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > Hi Vinay,
> > > >> > > > >
> > > >> > > > > This was fixed via
> > > >> https://issues.apache.org/jira/browse/KAFKA-3470
> > > >> > > > (will
> > > >> > > > > be part of 0.10.0.0).
> > > >> > > > >
> > > >> > > > > Ismael
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <
> > > >> > > vinsharma.tech@gmail.com>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hello,
> > > >> > > > > >
> > > >> > > > > > I am using client API 0.9.0.1 and facing an issue. As per
> my
> > > >> logs
> > > >> > it
> > > >> > > > > seems
> > > >> > > > > > that on each commitSync(Offsets) a heartbeat request is
> sent
> > > but
> > > >> > > after
> > > >> > > > a
> > > >> > > > > > metada refresh request till next poll(), commits do not
> send
> > > any
> > > >> > > > hearbeat
> > > >> > > > > > request.
> > > >> > > > > >
> > > >> > > > > > KafkaConsumers i create sometimes get session time out due
> > to
> > > no
> > > >> > > > hearbeat
> > > >> > > > > > specially during longer processing times. I call
> > > >> > CommitSync(offsets)
> > > >> > > > > after
> > > >> > > > > > regular intervals to keep session alive when processing
> > takes
> > > >> > longer
> > > >> > > > than
> > > >> > > > > > usual. Every thing works fine if commit intervals are very
> > > >> small or
> > > >> > > if
> > > >> > > > i
> > > >> > > > > > commit after each record but if i commit lets say every 12
> > > >> seconds
> > > >> > > and
> > > >> > > > 30
> > > >> > > > > > seconds is session time then i can see consumer getting
> > timed
> > > >> out
> > > >> > > > > > sometimes.
> > > >> > > > > >
> > > >> > > > > > Any help or pointers will be much appreciated. Thanks in
> > > >> advance.
> > > >> > > > > >
> > > >> > > > > > Regards,
> > > >> > > > > > Vinay sharma
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: No Heartbeat request on commit

Posted by Jason Gustafson <ja...@confluent.io>.

Ah, yeah. That's probably caused by the new topic metadata version, which
isn't supported on 0.9 brokers. To test on trunk, you'd have to upgrade the
brokers as well. Either that or you can rewind to before KAFKA-3306 (which
was just committed the day before yesterday)?

-Jason

On Thu, Apr 28, 2016 at 9:01 AM, vinay sharma <vi...@gmail.com>
wrote:

> Hi Jason,
>
> I build kafka-client and tried using it but my producers and consumers
> started throwing below exception. Is 0.10 not going to be compatible with
> brokers on version 0.9.0.1? or do i need to make some config changes to
> producers / consumers to make them compatible with brokers on old version?
> or do i need to upgrade brokers to new version as well?
>
>  org.apache.kafka.common.protocol.types.SchemaException: Error reading
> field 'brokers': Error reading field 'host': Error reading string of length
> 17995, only 145 bytes available
> at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
> at
>
> org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
> at
>
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
>
> Regards,
> Vinay Sharma
>
> On Thu, Apr 28, 2016 at 12:32 AM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hey Vinay,
> >
> > Any chance you can run the same test against trunk? I'm guessing this
> might
> > be caused by a bug in the 0.9 consumer which basically causes some
> requests
> > to fail when a bunch of them are sent to the broker at the same time.
> >
> > -Jason
> >
> > On Wed, Apr 27, 2016 at 1:02 PM, vinay sharma <vi...@gmail.com>
> > wrote:
> >
> > > Hi Jason,
> > >
> > > This makes sense.We use 0.9.0.1 and we do have session timeout set a
> bit
> > > high but nothing can guarantee that there will be no case when
> processing
> > > may not go higher than session timeout. I am trying to test a proactive
> > > commit approach to handle such cases when processing takes unusually
> long
> > > time. To keep consumer's session alive during long processing time i
> > > proactively commitSync processed records every 15 seconds. Session
> > timeout
> > > i kept is 30000.
> > >
> > > *Problem:-*
> > > With heart beat interval is 3000 then i expect a hearbeat request to be
> > > sent on each proactive commit which happens every 15 seconds. In my
> > tests i
> > > see that this does not happen always. I see a time window which is
> > greater
> > > than 30 seconds where no hearbeat is sent even thought there were
> commits
> > > in this duration. After this window i see a couple of successful
> > heartbeat
> > > responses till the end of poll but as soon as i poll again and call
> > > commitSync in next poll i get "ILLEGAL_GENERATION" error. This error
> > always
> > > happen just after meta refresh or in next poll processing after a meta
> > > refresh. I am attaching logs where i kept meta refresh interval 40000,
> > > 90000, 500000.
> > >
> > > *Test results *:-
> > > Test with meta refresh 40000 ms ran around 70 seconds from 1st poll.
> > > Test with meta refresh 90000 ms ran around 120 seconds from 1st poll.
> > > Test with meta refresh 500000 ms ran around 564 seconds from 1st poll.
> > >
> > > Every test falls in line with above test cases where generation is
> marked
> > > dead some time after a meta refresh. Meta refresh before 1st poll does
> > not
> > > create any issue but the ones after poll and during long processing do.
> > >
> > > *Environment:-*
> > > My setup has 3 brokers 1 zk. Topic has 3 partitions ans has replication
> > > factor 3. Messages are already published to topic.
> > >
> > > *Logic used in test cases :- *
> > > On each poll I initialize a map with current committed offset position
> of
> > > partitions being consumed. I update this map after each record
> processing
> > > and use this map to proactively commit every 15 seconds. Map is
> > initialized
> > > again after a proactive commit.
> > >
> > > I am not sure what is wrong here but i do not see any issue in code or
> > > offset commits going on. Log files and a class with main method are
> > > attached for your reference.
> > >
> > > Regards,
> > > Vinay Sharma
> > >
> > >
> > >
> > > On Wed, Apr 27, 2016 at 2:46 PM, Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > >> Hi Vinay,
> > >>
> > >> Answers below:
> > >>
> > >> 1)  Is it correct to say that each commitSync will trigger a
> > >> HeartBeatTask?
> > >> > If there is no hear beat sent in past since specified heartbeat
> > interval
> > >> > then i should see a successful heartbeat response or failure message
> > in
> > >> > logs near to commitSync success log?
> > >>
> > >>
> > >> Not quite. Heartbeats are sent periodically according to the
> > >> heartbeat.interval.ms configuration. However, since the consumer has
> no
> > >> background thread, they can only be sent in API calls such as poll()
> or
> > >> commitSync(). So calling commitSync() may or may not result in a
> > heartbeat
> > >> depending only on whether one is "due."
> > >>
> > >> 2) is it correct to say that Meta Data refresh will not act as
> > heartbeat,
> > >> > will not trigger heartBeatTask and will not reset heartBeatTask?
> > >>
> > >>
> > >> That is correct. Metadata refreshes are not related to heartbeats.
> > >>
> > >> 3) Where does a consumer session maintained? Lets say my consumer is
> > >> > listening to 3 partitions on a 3 broker cluster where each broker is
> > >> leader
> > >> > of 1 partition. So will each of the brokers will have a session for
> my
> > >> > consumer or is it just 1 session maintained somewhere in common like
> > >> > zookeeper?
> > >>
> > >>
> > >> One of the brokers serves as the "group coordinator." When the
> consumer
> > >> starts up, it sends a GroupCoordinator request to one of the brokers
> to
> > >> find out who the coordinator is. Currently, coordinators are chosen
> from
> > >> among the leaders of the partitions of the __consumer_offsets topic.
> > This
> > >> lets us take advantage of the leader election process to also handle
> > >> coordinator failures. The coordinator of each group maintains state
> for
> > >> the
> > >> group and keeps track of session timeouts.
> > >>
> > >> 4) In above setup, during a long processing if I commit a record
> through
> > >> > commmitSync which triggers a hear beat request and a successful
> > >> response is
> > >> > received for the same then what does this response means? does it
> mean
> > >> that
> > >> > my session with each broker is renewed? or does it mean that just
> the
> > >> > leader for partition of committed record knows that my consumer is
> > alive
> > >> > and consumer's session on other brokers will still timeout?
> > >>
> > >>
> > >> The coordinator is the only broker that is aware of a consumer's
> session
> > >> and all offset commits are sent to it. Successful heartbeats mean that
> > the
> > >> session is still active. Heartbeats are also used to let the consumer
> > >> discover when a rebalance has begun. If a new member joins the group,
> > then
> > >> the coordinator returns an error code in the heartbeat responses of
> the
> > >> active members to let them know that they need to rejoin the group so
> > that
> > >> partitions can be rebalanced.
> > >>
> > >> I wouldn't get too hung up on commit/heartbeat behavior. The crux of
> the
> > >> issue is that you need to call poll() often enough to avoid getting
> > timed
> > >> out by the coordinator. If you find this happening frequently, you
> > >> probably
> > >> need to increase session.timeout.ms. There's not really any downside
> to
> > >> doing so other than that hard failures (in which the consumer can't be
> > >> shutdown cleanly) will take a little longer to detect. Normal shutdown
> > >> doesn't have this problem. It can be difficult in 0.9 to ensure that
> > >> poll()
> > >> is called often enough since you don't have direct control over the
> > amount
> > >> of data returned in poll(), but we're adding an option
> > (max.poll.records)
> > >> in 0.10 which hopefully can be set conservatively enough to make this
> > >> problem go away.
> > >>
> > >> -Jason
> > >>
> > >> On Wed, Apr 27, 2016 at 7:11 AM, vinay sharma <
> vinsharma.tech@gmail.com
> > >
> > >> wrote:
> > >>
> > >> > Hey,
> > >> >
> > >> > I am working on a simplified test case to check if there is any
> issue
> > >> in my
> > >> > code. Just to make sure that any of my assumptions are not wrong, it
> > >> will
> > >> > be great if you can please help me in finding answers to following
> > >> > queries:-
> > >> >
> > >> > 1)  Is it correct to say that each commitSync will trigger a
> > >> HeartBeatTask?
> > >> > If there is no hear beat sent in past since specified heartbeat
> > interval
> > >> > then i should see a successful heartbeat response or failure message
> > in
> > >> > logs near to commitSync success log?
> > >> > 2) is it correct to say that Meta Data refresh will not act as
> > >> heartbeat,
> > >> > will not trigger heartBeatTask and will not reset heartBeatTask?
> > >> > 3) Where does a consumer session maintained? Lets say my consumer is
> > >> > listening to 3 partitions on a 3 broker cluster where each broker is
> > >> leader
> > >> > of 1 partition. So will each of the brokers will have a session for
> my
> > >> > consumer or is it just 1 session maintained somewhere in common like
> > >> > zookeeper?
> > >> > 4) In above setup, during a long processing if I commit a record
> > through
> > >> > commmitSync which triggers a hear beat request and a successful
> > >> response is
> > >> > received for the same then what does this response means? does it
> mean
> > >> that
> > >> > my session with each broker is renewed? or does it mean that just
> the
> > >> > leader for partition of committed record knows that my consumer is
> > alive
> > >> > and consumer's session on other brokers will still timeout?
> > >> >
> > >> > Regards,
> > >> > Vinay Sharma
> > >> >
> > >> > On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson <
> jason@confluent.io>
> > >> > wrote:
> > >> >
> > >> > > Hey Vinay,
> > >> > >
> > >> > > Are you saying that heartbeats are not sent while a metadata
> refresh
> > >> is
> > >> > in
> > >> > > progress? Do you have any logs which show us the apparent problem?
> > >> > >
> > >> > > Thanks,
> > >> > > Jason
> > >> > >
> > >> > > On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <
> > >> vinsharma.tech@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Hi Ismael,
> > >> > > >
> > >> > > > Treating commitSync as heartbeat will definitely resolve the
> issue
> > >> i am
> > >> > > > facing but the reason behind my issue does not seem to be what
> > >> > mentioned
> > >> > > in
> > >> > > > defect (i.e frequent commitSync requests).
> > >> > > >
> > >> > > > I am sending CommitSync periodically only to keep my session
> alive
> > >> when
> > >> > > my
> > >> > > > consumer is still processing records and is close to session
> time
> > >> out
> > >> > > > (tried 10th / 12th / 15th / 20th second after poll called where
> > >> session
> > >> > > > time is 30). I see heartbeat response received in logs along
> with
> > >> each
> > >> > > > commitSync call but this stops after a meta data refresh request
> > is
> > >> > > issued.
> > >> > > > I see in logs that commit goes successful but no heartbeat
> > response
> > >> > > > received message in logs after meta refresh till next poll.
> > >> > > >
> > >> > > > Regards,
> > >> > > > Vinay Sharma
> > >> > > >
> > >> > > > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <ismael@juma.me.uk
> >
> > >> > wrote:
> > >> > > >
> > >> > > > > Hi Vinay,
> > >> > > > >
> > >> > > > > This was fixed via
> > >> https://issues.apache.org/jira/browse/KAFKA-3470
> > >> > > > (will
> > >> > > > > be part of 0.10.0.0).
> > >> > > > >
> > >> > > > > Ismael
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <
> > >> > > vinsharma.tech@gmail.com>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hello,
> > >> > > > > >
> > >> > > > > > I am using client API 0.9.0.1 and facing an issue. As per my
> > >> logs
> > >> > it
> > >> > > > > seems
> > >> > > > > > that on each commitSync(Offsets) a heartbeat request is sent
> > but
> > >> > > after
> > >> > > > a
> > >> > > > > > metada refresh request till next poll(), commits do not send
> > any
> > >> > > > hearbeat
> > >> > > > > > request.
> > >> > > > > >
> > >> > > > > > KafkaConsumers i create sometimes get session time out due
> to
> > no
> > >> > > > hearbeat
> > >> > > > > > specially during longer processing times. I call
> > >> > CommitSync(offsets)
> > >> > > > > after
> > >> > > > > > regular intervals to keep session alive when processing
> takes
> > >> > longer
> > >> > > > than
> > >> > > > > > usual. Every thing works fine if commit intervals are very
> > >> small or
> > >> > > if
> > >> > > > i
> > >> > > > > > commit after each record but if i commit lets say every 12
> > >> seconds
> > >> > > and
> > >> > > > 30
> > >> > > > > > seconds is session time then i can see consumer getting
> timed
> > >> out
> > >> > > > > > sometimes.
> > >> > > > > >
> > >> > > > > > Any help or pointers will be much appreciated. Thanks in
> > >> advance.
> > >> > > > > >
> > >> > > > > > Regards,
> > >> > > > > > Vinay sharma
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: No Heartbeat request on commit

Posted by vinay sharma <vi...@gmail.com>.

Hi Jason,

I build kafka-client and tried using it but my producers and consumers
started throwing below exception. Is 0.10 not going to be compatible with
brokers on version 0.9.0.1? or do i need to make some config changes to
producers / consumers to make them compatible with brokers on old version?
or do i need to upgrade brokers to new version as well?

 org.apache.kafka.common.protocol.types.SchemaException: Error reading
field 'brokers': Error reading field 'host': Error reading string of length
17995, only 145 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at
org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
at
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)

Regards,
Vinay Sharma

On Thu, Apr 28, 2016 at 12:32 AM, Jason Gustafson <ja...@confluent.io>
wrote:

> Hey Vinay,
>
> Any chance you can run the same test against trunk? I'm guessing this might
> be caused by a bug in the 0.9 consumer which basically causes some requests
> to fail when a bunch of them are sent to the broker at the same time.
>
> -Jason
>
> On Wed, Apr 27, 2016 at 1:02 PM, vinay sharma <vi...@gmail.com>
> wrote:
>
> > Hi Jason,
> >
> > This makes sense.We use 0.9.0.1 and we do have session timeout set a bit
> > high but nothing can guarantee that there will be no case when processing
> > may not go higher than session timeout. I am trying to test a proactive
> > commit approach to handle such cases when processing takes unusually long
> > time. To keep consumer's session alive during long processing time i
> > proactively commitSync processed records every 15 seconds. Session
> timeout
> > i kept is 30000.
> >
> > *Problem:-*
> > With heart beat interval is 3000 then i expect a hearbeat request to be
> > sent on each proactive commit which happens every 15 seconds. In my
> tests i
> > see that this does not happen always. I see a time window which is
> greater
> > than 30 seconds where no hearbeat is sent even thought there were commits
> > in this duration. After this window i see a couple of successful
> heartbeat
> > responses till the end of poll but as soon as i poll again and call
> > commitSync in next poll i get "ILLEGAL_GENERATION" error. This error
> always
> > happen just after meta refresh or in next poll processing after a meta
> > refresh. I am attaching logs where i kept meta refresh interval 40000,
> > 90000, 500000.
> >
> > *Test results *:-
> > Test with meta refresh 40000 ms ran around 70 seconds from 1st poll.
> > Test with meta refresh 90000 ms ran around 120 seconds from 1st poll.
> > Test with meta refresh 500000 ms ran around 564 seconds from 1st poll.
> >
> > Every test falls in line with above test cases where generation is marked
> > dead some time after a meta refresh. Meta refresh before 1st poll does
> not
> > create any issue but the ones after poll and during long processing do.
> >
> > *Environment:-*
> > My setup has 3 brokers 1 zk. Topic has 3 partitions ans has replication
> > factor 3. Messages are already published to topic.
> >
> > *Logic used in test cases :- *
> > On each poll I initialize a map with current committed offset position of
> > partitions being consumed. I update this map after each record processing
> > and use this map to proactively commit every 15 seconds. Map is
> initialized
> > again after a proactive commit.
> >
> > I am not sure what is wrong here but i do not see any issue in code or
> > offset commits going on. Log files and a class with main method are
> > attached for your reference.
> >
> > Regards,
> > Vinay Sharma
> >
> >
> >
> > On Wed, Apr 27, 2016 at 2:46 PM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> >> Hi Vinay,
> >>
> >> Answers below:
> >>
> >> 1)  Is it correct to say that each commitSync will trigger a
> >> HeartBeatTask?
> >> > If there is no hear beat sent in past since specified heartbeat
> interval
> >> > then i should see a successful heartbeat response or failure message
> in
> >> > logs near to commitSync success log?
> >>
> >>
> >> Not quite. Heartbeats are sent periodically according to the
> >> heartbeat.interval.ms configuration. However, since the consumer has no
> >> background thread, they can only be sent in API calls such as poll() or
> >> commitSync(). So calling commitSync() may or may not result in a
> heartbeat
> >> depending only on whether one is "due."
> >>
> >> 2) is it correct to say that Meta Data refresh will not act as
> heartbeat,
> >> > will not trigger heartBeatTask and will not reset heartBeatTask?
> >>
> >>
> >> That is correct. Metadata refreshes are not related to heartbeats.
> >>
> >> 3) Where does a consumer session maintained? Lets say my consumer is
> >> > listening to 3 partitions on a 3 broker cluster where each broker is
> >> leader
> >> > of 1 partition. So will each of the brokers will have a session for my
> >> > consumer or is it just 1 session maintained somewhere in common like
> >> > zookeeper?
> >>
> >>
> >> One of the brokers serves as the "group coordinator." When the consumer
> >> starts up, it sends a GroupCoordinator request to one of the brokers to
> >> find out who the coordinator is. Currently, coordinators are chosen from
> >> among the leaders of the partitions of the __consumer_offsets topic.
> This
> >> lets us take advantage of the leader election process to also handle
> >> coordinator failures. The coordinator of each group maintains state for
> >> the
> >> group and keeps track of session timeouts.
> >>
> >> 4) In above setup, during a long processing if I commit a record through
> >> > commmitSync which triggers a hear beat request and a successful
> >> response is
> >> > received for the same then what does this response means? does it mean
> >> that
> >> > my session with each broker is renewed? or does it mean that just the
> >> > leader for partition of committed record knows that my consumer is
> alive
> >> > and consumer's session on other brokers will still timeout?
> >>
> >>
> >> The coordinator is the only broker that is aware of a consumer's session
> >> and all offset commits are sent to it. Successful heartbeats mean that
> the
> >> session is still active. Heartbeats are also used to let the consumer
> >> discover when a rebalance has begun. If a new member joins the group,
> then
> >> the coordinator returns an error code in the heartbeat responses of the
> >> active members to let them know that they need to rejoin the group so
> that
> >> partitions can be rebalanced.
> >>
> >> I wouldn't get too hung up on commit/heartbeat behavior. The crux of the
> >> issue is that you need to call poll() often enough to avoid getting
> timed
> >> out by the coordinator. If you find this happening frequently, you
> >> probably
> >> need to increase session.timeout.ms. There's not really any downside to
> >> doing so other than that hard failures (in which the consumer can't be
> >> shutdown cleanly) will take a little longer to detect. Normal shutdown
> >> doesn't have this problem. It can be difficult in 0.9 to ensure that
> >> poll()
> >> is called often enough since you don't have direct control over the
> amount
> >> of data returned in poll(), but we're adding an option
> (max.poll.records)
> >> in 0.10 which hopefully can be set conservatively enough to make this
> >> problem go away.
> >>
> >> -Jason
> >>
> >> On Wed, Apr 27, 2016 at 7:11 AM, vinay sharma <vinsharma.tech@gmail.com
> >
> >> wrote:
> >>
> >> > Hey,
> >> >
> >> > I am working on a simplified test case to check if there is any issue
> >> in my
> >> > code. Just to make sure that any of my assumptions are not wrong, it
> >> will
> >> > be great if you can please help me in finding answers to following
> >> > queries:-
> >> >
> >> > 1)  Is it correct to say that each commitSync will trigger a
> >> HeartBeatTask?
> >> > If there is no hear beat sent in past since specified heartbeat
> interval
> >> > then i should see a successful heartbeat response or failure message
> in
> >> > logs near to commitSync success log?
> >> > 2) is it correct to say that Meta Data refresh will not act as
> >> heartbeat,
> >> > will not trigger heartBeatTask and will not reset heartBeatTask?
> >> > 3) Where does a consumer session maintained? Lets say my consumer is
> >> > listening to 3 partitions on a 3 broker cluster where each broker is
> >> leader
> >> > of 1 partition. So will each of the brokers will have a session for my
> >> > consumer or is it just 1 session maintained somewhere in common like
> >> > zookeeper?
> >> > 4) In above setup, during a long processing if I commit a record
> through
> >> > commmitSync which triggers a hear beat request and a successful
> >> response is
> >> > received for the same then what does this response means? does it mean
> >> that
> >> > my session with each broker is renewed? or does it mean that just the
> >> > leader for partition of committed record knows that my consumer is
> alive
> >> > and consumer's session on other brokers will still timeout?
> >> >
> >> > Regards,
> >> > Vinay Sharma
> >> >
> >> > On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson <ja...@confluent.io>
> >> > wrote:
> >> >
> >> > > Hey Vinay,
> >> > >
> >> > > Are you saying that heartbeats are not sent while a metadata refresh
> >> is
> >> > in
> >> > > progress? Do you have any logs which show us the apparent problem?
> >> > >
> >> > > Thanks,
> >> > > Jason
> >> > >
> >> > > On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <
> >> vinsharma.tech@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Hi Ismael,
> >> > > >
> >> > > > Treating commitSync as heartbeat will definitely resolve the issue
> >> i am
> >> > > > facing but the reason behind my issue does not seem to be what
> >> > mentioned
> >> > > in
> >> > > > defect (i.e frequent commitSync requests).
> >> > > >
> >> > > > I am sending CommitSync periodically only to keep my session alive
> >> when
> >> > > my
> >> > > > consumer is still processing records and is close to session time
> >> out
> >> > > > (tried 10th / 12th / 15th / 20th second after poll called where
> >> session
> >> > > > time is 30). I see heartbeat response received in logs along with
> >> each
> >> > > > commitSync call but this stops after a meta data refresh request
> is
> >> > > issued.
> >> > > > I see in logs that commit goes successful but no heartbeat
> response
> >> > > > received message in logs after meta refresh till next poll.
> >> > > >
> >> > > > Regards,
> >> > > > Vinay Sharma
> >> > > >
> >> > > > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <is...@juma.me.uk>
> >> > wrote:
> >> > > >
> >> > > > > Hi Vinay,
> >> > > > >
> >> > > > > This was fixed via
> >> https://issues.apache.org/jira/browse/KAFKA-3470
> >> > > > (will
> >> > > > > be part of 0.10.0.0).
> >> > > > >
> >> > > > > Ismael
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <
> >> > > vinsharma.tech@gmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hello,
> >> > > > > >
> >> > > > > > I am using client API 0.9.0.1 and facing an issue. As per my
> >> logs
> >> > it
> >> > > > > seems
> >> > > > > > that on each commitSync(Offsets) a heartbeat request is sent
> but
> >> > > after
> >> > > > a
> >> > > > > > metada refresh request till next poll(), commits do not send
> any
> >> > > > hearbeat
> >> > > > > > request.
> >> > > > > >
> >> > > > > > KafkaConsumers i create sometimes get session time out due to
> no
> >> > > > hearbeat
> >> > > > > > specially during longer processing times. I call
> >> > CommitSync(offsets)
> >> > > > > after
> >> > > > > > regular intervals to keep session alive when processing takes
> >> > longer
> >> > > > than
> >> > > > > > usual. Every thing works fine if commit intervals are very
> >> small or
> >> > > if
> >> > > > i
> >> > > > > > commit after each record but if i commit lets say every 12
> >> seconds
> >> > > and
> >> > > > 30
> >> > > > > > seconds is session time then i can see consumer getting timed
> >> out
> >> > > > > > sometimes.
> >> > > > > >
> >> > > > > > Any help or pointers will be much appreciated. Thanks in
> >> advance.
> >> > > > > >
> >> > > > > > Regards,
> >> > > > > > Vinay sharma
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: No Heartbeat request on commit

Posted by Jason Gustafson <ja...@confluent.io>.

Hey Vinay,

Any chance you can run the same test against trunk? I'm guessing this might
be caused by a bug in the 0.9 consumer which basically causes some requests
to fail when a bunch of them are sent to the broker at the same time.

-Jason

On Wed, Apr 27, 2016 at 1:02 PM, vinay sharma <vi...@gmail.com>
wrote:

> Hi Jason,
>
> This makes sense.We use 0.9.0.1 and we do have session timeout set a bit
> high but nothing can guarantee that there will be no case when processing
> may not go higher than session timeout. I am trying to test a proactive
> commit approach to handle such cases when processing takes unusually long
> time. To keep consumer's session alive during long processing time i
> proactively commitSync processed records every 15 seconds. Session timeout
> i kept is 30000.
>
> *Problem:-*
> With heart beat interval is 3000 then i expect a hearbeat request to be
> sent on each proactive commit which happens every 15 seconds. In my tests i
> see that this does not happen always. I see a time window which is greater
> than 30 seconds where no hearbeat is sent even thought there were commits
> in this duration. After this window i see a couple of successful heartbeat
> responses till the end of poll but as soon as i poll again and call
> commitSync in next poll i get "ILLEGAL_GENERATION" error. This error always
> happen just after meta refresh or in next poll processing after a meta
> refresh. I am attaching logs where i kept meta refresh interval 40000,
> 90000, 500000.
>
> *Test results *:-
> Test with meta refresh 40000 ms ran around 70 seconds from 1st poll.
> Test with meta refresh 90000 ms ran around 120 seconds from 1st poll.
> Test with meta refresh 500000 ms ran around 564 seconds from 1st poll.
>
> Every test falls in line with above test cases where generation is marked
> dead some time after a meta refresh. Meta refresh before 1st poll does not
> create any issue but the ones after poll and during long processing do.
>
> *Environment:-*
> My setup has 3 brokers 1 zk. Topic has 3 partitions ans has replication
> factor 3. Messages are already published to topic.
>
> *Logic used in test cases :- *
> On each poll I initialize a map with current committed offset position of
> partitions being consumed. I update this map after each record processing
> and use this map to proactively commit every 15 seconds. Map is initialized
> again after a proactive commit.
>
> I am not sure what is wrong here but i do not see any issue in code or
> offset commits going on. Log files and a class with main method are
> attached for your reference.
>
> Regards,
> Vinay Sharma
>
>
>
> On Wed, Apr 27, 2016 at 2:46 PM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
>> Hi Vinay,
>>
>> Answers below:
>>
>> 1)  Is it correct to say that each commitSync will trigger a
>> HeartBeatTask?
>> > If there is no hear beat sent in past since specified heartbeat interval
>> > then i should see a successful heartbeat response or failure message in
>> > logs near to commitSync success log?
>>
>>
>> Not quite. Heartbeats are sent periodically according to the
>> heartbeat.interval.ms configuration. However, since the consumer has no
>> background thread, they can only be sent in API calls such as poll() or
>> commitSync(). So calling commitSync() may or may not result in a heartbeat
>> depending only on whether one is "due."
>>
>> 2) is it correct to say that Meta Data refresh will not act as heartbeat,
>> > will not trigger heartBeatTask and will not reset heartBeatTask?
>>
>>
>> That is correct. Metadata refreshes are not related to heartbeats.
>>
>> 3) Where does a consumer session maintained? Lets say my consumer is
>> > listening to 3 partitions on a 3 broker cluster where each broker is
>> leader
>> > of 1 partition. So will each of the brokers will have a session for my
>> > consumer or is it just 1 session maintained somewhere in common like
>> > zookeeper?
>>
>>
>> One of the brokers serves as the "group coordinator." When the consumer
>> starts up, it sends a GroupCoordinator request to one of the brokers to
>> find out who the coordinator is. Currently, coordinators are chosen from
>> among the leaders of the partitions of the __consumer_offsets topic. This
>> lets us take advantage of the leader election process to also handle
>> coordinator failures. The coordinator of each group maintains state for
>> the
>> group and keeps track of session timeouts.
>>
>> 4) In above setup, during a long processing if I commit a record through
>> > commmitSync which triggers a hear beat request and a successful
>> response is
>> > received for the same then what does this response means? does it mean
>> that
>> > my session with each broker is renewed? or does it mean that just the
>> > leader for partition of committed record knows that my consumer is alive
>> > and consumer's session on other brokers will still timeout?
>>
>>
>> The coordinator is the only broker that is aware of a consumer's session
>> and all offset commits are sent to it. Successful heartbeats mean that the
>> session is still active. Heartbeats are also used to let the consumer
>> discover when a rebalance has begun. If a new member joins the group, then
>> the coordinator returns an error code in the heartbeat responses of the
>> active members to let them know that they need to rejoin the group so that
>> partitions can be rebalanced.
>>
>> I wouldn't get too hung up on commit/heartbeat behavior. The crux of the
>> issue is that you need to call poll() often enough to avoid getting timed
>> out by the coordinator. If you find this happening frequently, you
>> probably
>> need to increase session.timeout.ms. There's not really any downside to
>> doing so other than that hard failures (in which the consumer can't be
>> shutdown cleanly) will take a little longer to detect. Normal shutdown
>> doesn't have this problem. It can be difficult in 0.9 to ensure that
>> poll()
>> is called often enough since you don't have direct control over the amount
>> of data returned in poll(), but we're adding an option (max.poll.records)
>> in 0.10 which hopefully can be set conservatively enough to make this
>> problem go away.
>>
>> -Jason
>>
>> On Wed, Apr 27, 2016 at 7:11 AM, vinay sharma <vi...@gmail.com>
>> wrote:
>>
>> > Hey,
>> >
>> > I am working on a simplified test case to check if there is any issue
>> in my
>> > code. Just to make sure that any of my assumptions are not wrong, it
>> will
>> > be great if you can please help me in finding answers to following
>> > queries:-
>> >
>> > 1)  Is it correct to say that each commitSync will trigger a
>> HeartBeatTask?
>> > If there is no hear beat sent in past since specified heartbeat interval
>> > then i should see a successful heartbeat response or failure message in
>> > logs near to commitSync success log?
>> > 2) is it correct to say that Meta Data refresh will not act as
>> heartbeat,
>> > will not trigger heartBeatTask and will not reset heartBeatTask?
>> > 3) Where does a consumer session maintained? Lets say my consumer is
>> > listening to 3 partitions on a 3 broker cluster where each broker is
>> leader
>> > of 1 partition. So will each of the brokers will have a session for my
>> > consumer or is it just 1 session maintained somewhere in common like
>> > zookeeper?
>> > 4) In above setup, during a long processing if I commit a record through
>> > commmitSync which triggers a hear beat request and a successful
>> response is
>> > received for the same then what does this response means? does it mean
>> that
>> > my session with each broker is renewed? or does it mean that just the
>> > leader for partition of committed record knows that my consumer is alive
>> > and consumer's session on other brokers will still timeout?
>> >
>> > Regards,
>> > Vinay Sharma
>> >
>> > On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson <ja...@confluent.io>
>> > wrote:
>> >
>> > > Hey Vinay,
>> > >
>> > > Are you saying that heartbeats are not sent while a metadata refresh
>> is
>> > in
>> > > progress? Do you have any logs which show us the apparent problem?
>> > >
>> > > Thanks,
>> > > Jason
>> > >
>> > > On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <
>> vinsharma.tech@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Ismael,
>> > > >
>> > > > Treating commitSync as heartbeat will definitely resolve the issue
>> i am
>> > > > facing but the reason behind my issue does not seem to be what
>> > mentioned
>> > > in
>> > > > defect (i.e frequent commitSync requests).
>> > > >
>> > > > I am sending CommitSync periodically only to keep my session alive
>> when
>> > > my
>> > > > consumer is still processing records and is close to session time
>> out
>> > > > (tried 10th / 12th / 15th / 20th second after poll called where
>> session
>> > > > time is 30). I see heartbeat response received in logs along with
>> each
>> > > > commitSync call but this stops after a meta data refresh request is
>> > > issued.
>> > > > I see in logs that commit goes successful but no heartbeat response
>> > > > received message in logs after meta refresh till next poll.
>> > > >
>> > > > Regards,
>> > > > Vinay Sharma
>> > > >
>> > > > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <is...@juma.me.uk>
>> > wrote:
>> > > >
>> > > > > Hi Vinay,
>> > > > >
>> > > > > This was fixed via
>> https://issues.apache.org/jira/browse/KAFKA-3470
>> > > > (will
>> > > > > be part of 0.10.0.0).
>> > > > >
>> > > > > Ismael
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <
>> > > vinsharma.tech@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Hello,
>> > > > > >
>> > > > > > I am using client API 0.9.0.1 and facing an issue. As per my
>> logs
>> > it
>> > > > > seems
>> > > > > > that on each commitSync(Offsets) a heartbeat request is sent but
>> > > after
>> > > > a
>> > > > > > metada refresh request till next poll(), commits do not send any
>> > > > hearbeat
>> > > > > > request.
>> > > > > >
>> > > > > > KafkaConsumers i create sometimes get session time out due to no
>> > > > hearbeat
>> > > > > > specially during longer processing times. I call
>> > CommitSync(offsets)
>> > > > > after
>> > > > > > regular intervals to keep session alive when processing takes
>> > longer
>> > > > than
>> > > > > > usual. Every thing works fine if commit intervals are very
>> small or
>> > > if
>> > > > i
>> > > > > > commit after each record but if i commit lets say every 12
>> seconds
>> > > and
>> > > > 30
>> > > > > > seconds is session time then i can see consumer getting timed
>> out
>> > > > > > sometimes.
>> > > > > >
>> > > > > > Any help or pointers will be much appreciated. Thanks in
>> advance.
>> > > > > >
>> > > > > > Regards,
>> > > > > > Vinay sharma
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: No Heartbeat request on commit

Posted by vinay sharma <vi...@gmail.com>.

Hi Jason,

This makes sense.We use 0.9.0.1 and we do have session timeout set a bit
high but nothing can guarantee that there will be no case when processing
may not go higher than session timeout. I am trying to test a proactive
commit approach to handle such cases when processing takes unusually long
time. To keep consumer's session alive during long processing time i
proactively commitSync processed records every 15 seconds. Session timeout
i kept is 30000.

*Problem:-*
With heart beat interval is 3000 then i expect a hearbeat request to be
sent on each proactive commit which happens every 15 seconds. In my tests i
see that this does not happen always. I see a time window which is greater
than 30 seconds where no hearbeat is sent even thought there were commits
in this duration. After this window i see a couple of successful heartbeat
responses till the end of poll but as soon as i poll again and call
commitSync in next poll i get "ILLEGAL_GENERATION" error. This error always
happen just after meta refresh or in next poll processing after a meta
refresh. I am attaching logs where i kept meta refresh interval 40000,
90000, 500000.

*Test results *:-
Test with meta refresh 40000 ms ran around 70 seconds from 1st poll.
Test with meta refresh 90000 ms ran around 120 seconds from 1st poll.
Test with meta refresh 500000 ms ran around 564 seconds from 1st poll.

Every test falls in line with above test cases where generation is marked
dead some time after a meta refresh. Meta refresh before 1st poll does not
create any issue but the ones after poll and during long processing do.

*Environment:-*
My setup has 3 brokers 1 zk. Topic has 3 partitions ans has replication
factor 3. Messages are already published to topic.

*Logic used in test cases :- *
On each poll I initialize a map with current committed offset position of
partitions being consumed. I update this map after each record processing
and use this map to proactively commit every 15 seconds. Map is initialized
again after a proactive commit.

I am not sure what is wrong here but i do not see any issue in code or
offset commits going on. Log files and a class with main method are
attached for your reference.

Regards,
Vinay Sharma



On Wed, Apr 27, 2016 at 2:46 PM, Jason Gustafson <ja...@confluent.io> wrote:

> Hi Vinay,
>
> Answers below:
>
> 1)  Is it correct to say that each commitSync will trigger a HeartBeatTask?
> > If there is no hear beat sent in past since specified heartbeat interval
> > then i should see a successful heartbeat response or failure message in
> > logs near to commitSync success log?
>
>
> Not quite. Heartbeats are sent periodically according to the
> heartbeat.interval.ms configuration. However, since the consumer has no
> background thread, they can only be sent in API calls such as poll() or
> commitSync(). So calling commitSync() may or may not result in a heartbeat
> depending only on whether one is "due."
>
> 2) is it correct to say that Meta Data refresh will not act as heartbeat,
> > will not trigger heartBeatTask and will not reset heartBeatTask?
>
>
> That is correct. Metadata refreshes are not related to heartbeats.
>
> 3) Where does a consumer session maintained? Lets say my consumer is
> > listening to 3 partitions on a 3 broker cluster where each broker is
> leader
> > of 1 partition. So will each of the brokers will have a session for my
> > consumer or is it just 1 session maintained somewhere in common like
> > zookeeper?
>
>
> One of the brokers serves as the "group coordinator." When the consumer
> starts up, it sends a GroupCoordinator request to one of the brokers to
> find out who the coordinator is. Currently, coordinators are chosen from
> among the leaders of the partitions of the __consumer_offsets topic. This
> lets us take advantage of the leader election process to also handle
> coordinator failures. The coordinator of each group maintains state for the
> group and keeps track of session timeouts.
>
> 4) In above setup, during a long processing if I commit a record through
> > commmitSync which triggers a hear beat request and a successful response
> is
> > received for the same then what does this response means? does it mean
> that
> > my session with each broker is renewed? or does it mean that just the
> > leader for partition of committed record knows that my consumer is alive
> > and consumer's session on other brokers will still timeout?
>
>
> The coordinator is the only broker that is aware of a consumer's session
> and all offset commits are sent to it. Successful heartbeats mean that the
> session is still active. Heartbeats are also used to let the consumer
> discover when a rebalance has begun. If a new member joins the group, then
> the coordinator returns an error code in the heartbeat responses of the
> active members to let them know that they need to rejoin the group so that
> partitions can be rebalanced.
>
> I wouldn't get too hung up on commit/heartbeat behavior. The crux of the
> issue is that you need to call poll() often enough to avoid getting timed
> out by the coordinator. If you find this happening frequently, you probably
> need to increase session.timeout.ms. There's not really any downside to
> doing so other than that hard failures (in which the consumer can't be
> shutdown cleanly) will take a little longer to detect. Normal shutdown
> doesn't have this problem. It can be difficult in 0.9 to ensure that poll()
> is called often enough since you don't have direct control over the amount
> of data returned in poll(), but we're adding an option (max.poll.records)
> in 0.10 which hopefully can be set conservatively enough to make this
> problem go away.
>
> -Jason
>
> On Wed, Apr 27, 2016 at 7:11 AM, vinay sharma <vi...@gmail.com>
> wrote:
>
> > Hey,
> >
> > I am working on a simplified test case to check if there is any issue in
> my
> > code. Just to make sure that any of my assumptions are not wrong, it will
> > be great if you can please help me in finding answers to following
> > queries:-
> >
> > 1)  Is it correct to say that each commitSync will trigger a
> HeartBeatTask?
> > If there is no hear beat sent in past since specified heartbeat interval
> > then i should see a successful heartbeat response or failure message in
> > logs near to commitSync success log?
> > 2) is it correct to say that Meta Data refresh will not act as heartbeat,
> > will not trigger heartBeatTask and will not reset heartBeatTask?
> > 3) Where does a consumer session maintained? Lets say my consumer is
> > listening to 3 partitions on a 3 broker cluster where each broker is
> leader
> > of 1 partition. So will each of the brokers will have a session for my
> > consumer or is it just 1 session maintained somewhere in common like
> > zookeeper?
> > 4) In above setup, during a long processing if I commit a record through
> > commmitSync which triggers a hear beat request and a successful response
> is
> > received for the same then what does this response means? does it mean
> that
> > my session with each broker is renewed? or does it mean that just the
> > leader for partition of committed record knows that my consumer is alive
> > and consumer's session on other brokers will still timeout?
> >
> > Regards,
> > Vinay Sharma
> >
> > On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hey Vinay,
> > >
> > > Are you saying that heartbeats are not sent while a metadata refresh is
> > in
> > > progress? Do you have any logs which show us the apparent problem?
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <
> vinsharma.tech@gmail.com>
> > > wrote:
> > >
> > > > Hi Ismael,
> > > >
> > > > Treating commitSync as heartbeat will definitely resolve the issue i
> am
> > > > facing but the reason behind my issue does not seem to be what
> > mentioned
> > > in
> > > > defect (i.e frequent commitSync requests).
> > > >
> > > > I am sending CommitSync periodically only to keep my session alive
> when
> > > my
> > > > consumer is still processing records and is close to session time out
> > > > (tried 10th / 12th / 15th / 20th second after poll called where
> session
> > > > time is 30). I see heartbeat response received in logs along with
> each
> > > > commitSync call but this stops after a meta data refresh request is
> > > issued.
> > > > I see in logs that commit goes successful but no heartbeat response
> > > > received message in logs after meta refresh till next poll.
> > > >
> > > > Regards,
> > > > Vinay Sharma
> > > >
> > > > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <is...@juma.me.uk>
> > wrote:
> > > >
> > > > > Hi Vinay,
> > > > >
> > > > > This was fixed via
> https://issues.apache.org/jira/browse/KAFKA-3470
> > > > (will
> > > > > be part of 0.10.0.0).
> > > > >
> > > > > Ismael
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <
> > > vinsharma.tech@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I am using client API 0.9.0.1 and facing an issue. As per my logs
> > it
> > > > > seems
> > > > > > that on each commitSync(Offsets) a heartbeat request is sent but
> > > after
> > > > a
> > > > > > metada refresh request till next poll(), commits do not send any
> > > > hearbeat
> > > > > > request.
> > > > > >
> > > > > > KafkaConsumers i create sometimes get session time out due to no
> > > > hearbeat
> > > > > > specially during longer processing times. I call
> > CommitSync(offsets)
> > > > > after
> > > > > > regular intervals to keep session alive when processing takes
> > longer
> > > > than
> > > > > > usual. Every thing works fine if commit intervals are very small
> or
> > > if
> > > > i
> > > > > > commit after each record but if i commit lets say every 12
> seconds
> > > and
> > > > 30
> > > > > > seconds is session time then i can see consumer getting timed out
> > > > > > sometimes.
> > > > > >
> > > > > > Any help or pointers will be much appreciated. Thanks in advance.
> > > > > >
> > > > > > Regards,
> > > > > > Vinay sharma
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: No Heartbeat request on commit

Posted by Jason Gustafson <ja...@confluent.io>.

Hi Vinay,

Answers below:

1)  Is it correct to say that each commitSync will trigger a HeartBeatTask?
> If there is no hear beat sent in past since specified heartbeat interval
> then i should see a successful heartbeat response or failure message in
> logs near to commitSync success log?


Not quite. Heartbeats are sent periodically according to the
heartbeat.interval.ms configuration. However, since the consumer has no
background thread, they can only be sent in API calls such as poll() or
commitSync(). So calling commitSync() may or may not result in a heartbeat
depending only on whether one is "due."

2) is it correct to say that Meta Data refresh will not act as heartbeat,
> will not trigger heartBeatTask and will not reset heartBeatTask?


That is correct. Metadata refreshes are not related to heartbeats.

3) Where does a consumer session maintained? Lets say my consumer is
> listening to 3 partitions on a 3 broker cluster where each broker is leader
> of 1 partition. So will each of the brokers will have a session for my
> consumer or is it just 1 session maintained somewhere in common like
> zookeeper?


One of the brokers serves as the "group coordinator." When the consumer
starts up, it sends a GroupCoordinator request to one of the brokers to
find out who the coordinator is. Currently, coordinators are chosen from
among the leaders of the partitions of the __consumer_offsets topic. This
lets us take advantage of the leader election process to also handle
coordinator failures. The coordinator of each group maintains state for the
group and keeps track of session timeouts.

4) In above setup, during a long processing if I commit a record through
> commmitSync which triggers a hear beat request and a successful response is
> received for the same then what does this response means? does it mean that
> my session with each broker is renewed? or does it mean that just the
> leader for partition of committed record knows that my consumer is alive
> and consumer's session on other brokers will still timeout?


The coordinator is the only broker that is aware of a consumer's session
and all offset commits are sent to it. Successful heartbeats mean that the
session is still active. Heartbeats are also used to let the consumer
discover when a rebalance has begun. If a new member joins the group, then
the coordinator returns an error code in the heartbeat responses of the
active members to let them know that they need to rejoin the group so that
partitions can be rebalanced.

I wouldn't get too hung up on commit/heartbeat behavior. The crux of the
issue is that you need to call poll() often enough to avoid getting timed
out by the coordinator. If you find this happening frequently, you probably
need to increase session.timeout.ms. There's not really any downside to
doing so other than that hard failures (in which the consumer can't be
shutdown cleanly) will take a little longer to detect. Normal shutdown
doesn't have this problem. It can be difficult in 0.9 to ensure that poll()
is called often enough since you don't have direct control over the amount
of data returned in poll(), but we're adding an option (max.poll.records)
in 0.10 which hopefully can be set conservatively enough to make this
problem go away.

-Jason

On Wed, Apr 27, 2016 at 7:11 AM, vinay sharma <vi...@gmail.com>
wrote:

> Hey,
>
> I am working on a simplified test case to check if there is any issue in my
> code. Just to make sure that any of my assumptions are not wrong, it will
> be great if you can please help me in finding answers to following
> queries:-
>
> 1)  Is it correct to say that each commitSync will trigger a HeartBeatTask?
> If there is no hear beat sent in past since specified heartbeat interval
> then i should see a successful heartbeat response or failure message in
> logs near to commitSync success log?
> 2) is it correct to say that Meta Data refresh will not act as heartbeat,
> will not trigger heartBeatTask and will not reset heartBeatTask?
> 3) Where does a consumer session maintained? Lets say my consumer is
> listening to 3 partitions on a 3 broker cluster where each broker is leader
> of 1 partition. So will each of the brokers will have a session for my
> consumer or is it just 1 session maintained somewhere in common like
> zookeeper?
> 4) In above setup, during a long processing if I commit a record through
> commmitSync which triggers a hear beat request and a successful response is
> received for the same then what does this response means? does it mean that
> my session with each broker is renewed? or does it mean that just the
> leader for partition of committed record knows that my consumer is alive
> and consumer's session on other brokers will still timeout?
>
> Regards,
> Vinay Sharma
>
> On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hey Vinay,
> >
> > Are you saying that heartbeats are not sent while a metadata refresh is
> in
> > progress? Do you have any logs which show us the apparent problem?
> >
> > Thanks,
> > Jason
> >
> > On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <vi...@gmail.com>
> > wrote:
> >
> > > Hi Ismael,
> > >
> > > Treating commitSync as heartbeat will definitely resolve the issue i am
> > > facing but the reason behind my issue does not seem to be what
> mentioned
> > in
> > > defect (i.e frequent commitSync requests).
> > >
> > > I am sending CommitSync periodically only to keep my session alive when
> > my
> > > consumer is still processing records and is close to session time out
> > > (tried 10th / 12th / 15th / 20th second after poll called where session
> > > time is 30). I see heartbeat response received in logs along with each
> > > commitSync call but this stops after a meta data refresh request is
> > issued.
> > > I see in logs that commit goes successful but no heartbeat response
> > > received message in logs after meta refresh till next poll.
> > >
> > > Regards,
> > > Vinay Sharma
> > >
> > > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <is...@juma.me.uk>
> wrote:
> > >
> > > > Hi Vinay,
> > > >
> > > > This was fixed via https://issues.apache.org/jira/browse/KAFKA-3470
> > > (will
> > > > be part of 0.10.0.0).
> > > >
> > > > Ismael
> > > >
> > > >
> > > >
> > > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <
> > vinsharma.tech@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I am using client API 0.9.0.1 and facing an issue. As per my logs
> it
> > > > seems
> > > > > that on each commitSync(Offsets) a heartbeat request is sent but
> > after
> > > a
> > > > > metada refresh request till next poll(), commits do not send any
> > > hearbeat
> > > > > request.
> > > > >
> > > > > KafkaConsumers i create sometimes get session time out due to no
> > > hearbeat
> > > > > specially during longer processing times. I call
> CommitSync(offsets)
> > > > after
> > > > > regular intervals to keep session alive when processing takes
> longer
> > > than
> > > > > usual. Every thing works fine if commit intervals are very small or
> > if
> > > i
> > > > > commit after each record but if i commit lets say every 12 seconds
> > and
> > > 30
> > > > > seconds is session time then i can see consumer getting timed out
> > > > > sometimes.
> > > > >
> > > > > Any help or pointers will be much appreciated. Thanks in advance.
> > > > >
> > > > > Regards,
> > > > > Vinay sharma
> > > > >
> > > >
> > >
> >
>

Re: No Heartbeat request on commit

Posted by vinay sharma <vi...@gmail.com>.

Hey,

I am working on a simplified test case to check if there is any issue in my
code. Just to make sure that any of my assumptions are not wrong, it will
be great if you can please help me in finding answers to following queries:-

1)  Is it correct to say that each commitSync will trigger a HeartBeatTask?
If there is no hear beat sent in past since specified heartbeat interval
then i should see a successful heartbeat response or failure message in
logs near to commitSync success log?
2) is it correct to say that Meta Data refresh will not act as heartbeat,
will not trigger heartBeatTask and will not reset heartBeatTask?
3) Where does a consumer session maintained? Lets say my consumer is
listening to 3 partitions on a 3 broker cluster where each broker is leader
of 1 partition. So will each of the brokers will have a session for my
consumer or is it just 1 session maintained somewhere in common like
zookeeper?
4) In above setup, during a long processing if I commit a record through
commmitSync which triggers a hear beat request and a successful response is
received for the same then what does this response means? does it mean that
my session with each broker is renewed? or does it mean that just the
leader for partition of committed record knows that my consumer is alive
and consumer's session on other brokers will still timeout?

Regards,
Vinay Sharma

On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson <ja...@confluent.io> wrote:

> Hey Vinay,
>
> Are you saying that heartbeats are not sent while a metadata refresh is in
> progress? Do you have any logs which show us the apparent problem?
>
> Thanks,
> Jason
>
> On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <vi...@gmail.com>
> wrote:
>
> > Hi Ismael,
> >
> > Treating commitSync as heartbeat will definitely resolve the issue i am
> > facing but the reason behind my issue does not seem to be what mentioned
> in
> > defect (i.e frequent commitSync requests).
> >
> > I am sending CommitSync periodically only to keep my session alive when
> my
> > consumer is still processing records and is close to session time out
> > (tried 10th / 12th / 15th / 20th second after poll called where session
> > time is 30). I see heartbeat response received in logs along with each
> > commitSync call but this stops after a meta data refresh request is
> issued.
> > I see in logs that commit goes successful but no heartbeat response
> > received message in logs after meta refresh till next poll.
> >
> > Regards,
> > Vinay Sharma
> >
> > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <is...@juma.me.uk> wrote:
> >
> > > Hi Vinay,
> > >
> > > This was fixed via https://issues.apache.org/jira/browse/KAFKA-3470
> > (will
> > > be part of 0.10.0.0).
> > >
> > > Ismael
> > >
> > >
> > >
> > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <
> vinsharma.tech@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I am using client API 0.9.0.1 and facing an issue. As per my logs it
> > > seems
> > > > that on each commitSync(Offsets) a heartbeat request is sent but
> after
> > a
> > > > metada refresh request till next poll(), commits do not send any
> > hearbeat
> > > > request.
> > > >
> > > > KafkaConsumers i create sometimes get session time out due to no
> > hearbeat
> > > > specially during longer processing times. I call CommitSync(offsets)
> > > after
> > > > regular intervals to keep session alive when processing takes longer
> > than
> > > > usual. Every thing works fine if commit intervals are very small or
> if
> > i
> > > > commit after each record but if i commit lets say every 12 seconds
> and
> > 30
> > > > seconds is session time then i can see consumer getting timed out
> > > > sometimes.
> > > >
> > > > Any help or pointers will be much appreciated. Thanks in advance.
> > > >
> > > > Regards,
> > > > Vinay sharma
> > > >
> > >
> >
>

Re: No Heartbeat request on commit

Posted by Jason Gustafson <ja...@confluent.io>.

Hey Vinay,

Are you saying that heartbeats are not sent while a metadata refresh is in
progress? Do you have any logs which show us the apparent problem?

Thanks,
Jason

On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <vi...@gmail.com>
wrote:

> Hi Ismael,
>
> Treating commitSync as heartbeat will definitely resolve the issue i am
> facing but the reason behind my issue does not seem to be what mentioned in
> defect (i.e frequent commitSync requests).
>
> I am sending CommitSync periodically only to keep my session alive when my
> consumer is still processing records and is close to session time out
> (tried 10th / 12th / 15th / 20th second after poll called where session
> time is 30). I see heartbeat response received in logs along with each
> commitSync call but this stops after a meta data refresh request is issued.
> I see in logs that commit goes successful but no heartbeat response
> received message in logs after meta refresh till next poll.
>
> Regards,
> Vinay Sharma
>
> On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <is...@juma.me.uk> wrote:
>
> > Hi Vinay,
> >
> > This was fixed via https://issues.apache.org/jira/browse/KAFKA-3470
> (will
> > be part of 0.10.0.0).
> >
> > Ismael
> >
> >
> >
> > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <vi...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > I am using client API 0.9.0.1 and facing an issue. As per my logs it
> > seems
> > > that on each commitSync(Offsets) a heartbeat request is sent but after
> a
> > > metada refresh request till next poll(), commits do not send any
> hearbeat
> > > request.
> > >
> > > KafkaConsumers i create sometimes get session time out due to no
> hearbeat
> > > specially during longer processing times. I call CommitSync(offsets)
> > after
> > > regular intervals to keep session alive when processing takes longer
> than
> > > usual. Every thing works fine if commit intervals are very small or if
> i
> > > commit after each record but if i commit lets say every 12 seconds and
> 30
> > > seconds is session time then i can see consumer getting timed out
> > > sometimes.
> > >
> > > Any help or pointers will be much appreciated. Thanks in advance.
> > >
> > > Regards,
> > > Vinay sharma
> > >
> >
>

Re: No Heartbeat request on commit

Posted by vinay sharma <vi...@gmail.com>.

Hi Ismael,

Treating commitSync as heartbeat will definitely resolve the issue i am
facing but the reason behind my issue does not seem to be what mentioned in
defect (i.e frequent commitSync requests).

I am sending CommitSync periodically only to keep my session alive when my
consumer is still processing records and is close to session time out
(tried 10th / 12th / 15th / 20th second after poll called where session
time is 30). I see heartbeat response received in logs along with each
commitSync call but this stops after a meta data refresh request is issued.
I see in logs that commit goes successful but no heartbeat response
received message in logs after meta refresh till next poll.

Regards,
Vinay Sharma

On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <is...@juma.me.uk> wrote:

> Hi Vinay,
>
> This was fixed via https://issues.apache.org/jira/browse/KAFKA-3470 (will
> be part of 0.10.0.0).
>
> Ismael
>
>
>
> On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <vi...@gmail.com>
> wrote:
>
> > Hello,
> >
> > I am using client API 0.9.0.1 and facing an issue. As per my logs it
> seems
> > that on each commitSync(Offsets) a heartbeat request is sent but after a
> > metada refresh request till next poll(), commits do not send any hearbeat
> > request.
> >
> > KafkaConsumers i create sometimes get session time out due to no hearbeat
> > specially during longer processing times. I call CommitSync(offsets)
> after
> > regular intervals to keep session alive when processing takes longer than
> > usual. Every thing works fine if commit intervals are very small or if i
> > commit after each record but if i commit lets say every 12 seconds and 30
> > seconds is session time then i can see consumer getting timed out
> > sometimes.
> >
> > Any help or pointers will be much appreciated. Thanks in advance.
> >
> > Regards,
> > Vinay sharma
> >
>

Re: No Heartbeat request on commit

Posted by Ismael Juma <is...@juma.me.uk>.

Hi Vinay,

This was fixed via https://issues.apache.org/jira/browse/KAFKA-3470 (will
be part of 0.10.0.0).

Ismael



On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <vi...@gmail.com>
wrote:

> Hello,
>
> I am using client API 0.9.0.1 and facing an issue. As per my logs it seems
> that on each commitSync(Offsets) a heartbeat request is sent but after a
> metada refresh request till next poll(), commits do not send any hearbeat
> request.
>
> KafkaConsumers i create sometimes get session time out due to no hearbeat
> specially during longer processing times. I call CommitSync(offsets) after
> regular intervals to keep session alive when processing takes longer than
> usual. Every thing works fine if commit intervals are very small or if i
> commit after each record but if i commit lets say every 12 seconds and 30
> seconds is session time then i can see consumer getting timed out
> sometimes.
>
> Any help or pointers will be much appreciated. Thanks in advance.
>
> Regards,
> Vinay sharma
>