You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Shantanu Deshmukh <sh...@gmail.com> on 2018/06/14 09:09:41 UTC

Frequent "offset out of range" messages, partitions deserted by consumer

We have a consumer application which has a single consumer group connecting
to multiple topics. We are seeing strange behaviour in consumer logs. With
these lines

 Fetch offset 1109143 is out of range for partition otp-email-4, resetting
offset
 Fetch offset 952168 is out of range for partition otp-email-7, resetting
offset
 Fetch offset 945796 is out of range for partition otp-email-5, resetting
offset
 Fetch offset 950900 is out of range for partition otp-email-0, resetting
offset
 Fetch offset 953163 is out of range for partition otp-email-3, resetting
offset
 Fetch offset 1118389 is out of range for partition otp-email-6, resetting
offset
 Fetch offset 1112177 is out of range for partition otp-email-2, resetting
offset
 Fetch offset 1109539 is out of range for partition otp-email-1, resetting
offset

Some time later we saw these logs

[2018-06-08 19:45:28] :: INFO  :: ConsumerCoordinator:333 - Revoking
previously assigned partitions [bulk-email-4, bulk-email-3, bulk-email-0,
bulk-email-2, bulk-email-1] for group notifications-consumer
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator:381 - (Re-)joining
group notifications-consumer
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: ConsumerCoordinator:225 - Setting newly
assigned partitions [bulk-email-8, bulk-email-7, bulk-email-9,
bulk-email-6, bulk-email-5] for group notifications-consumer
[2018-06-08 19:45:28] :: INFO  :: ConsumerCoordinator:225 - Setting newly
assigned partitions [transactional-sms-3, transactional-sms-2,
transactional-sms-1, transactional-sms-0] for group notifications-consumer
[2018-06-08 19:45:28] :: INFO  :: ConsumerCoordinator:225 - Setting newly
assigned partitions [transactional-sms-9, transactional-sms-8,
transactional-sms-7] for group notifications-consumer

I noticed that one of our topics was not seen in the list of *Setting newly
assigned partitions*. Then that topic had no consumers attached to it for 8
hours at least. It's only when someone restarted application it started
consuming from that topic. What can be going wrong here?

Here is consumer config

auto.commit.interval.ms = 3000
auto.offset.reset = latest
bootstrap.servers = [x.x.x.x:9092, x.x.x.x:9092, x.x.x.x:9092]
check.crcs = true
client.id =
connections.max.idle.ms = 540000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = otp-notifications-consumer
heartbeat.interval.ms = 3000
interceptor.classes = null
key.deserializer = class
org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 50
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class
org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.ms = 50
request.timeout.ms = 305000
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = SSL
send.buffer.bytes = 131072
session.timeout.ms = 300000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = /x/x/client.truststore.jks
ssl.truststore.password = [hidden]
ssl.truststore.type = JKS
value.deserializer = class
org.apache.kafka.common.serialization.StringDeserializer

The topic which went orphan has 10 partitions, retention.ms=1800000,
segment.ms=1800000.
Please help.

Thanks & Regards,

Shantanu Deshmukh

Re: Frequent "offset out of range" messages, partitions deserted by consumer

Posted by Shantanu Deshmukh <sh...@gmail.com>.
conusmer is always consuming. There's a trickle of messages which always
keep flowing. However, during 1am to 5am there are almost no messages.

On Wed, Jun 20, 2018 at 11:31 AM Liam Clarke <li...@adscale.co.nz>
wrote:

>  How often is the consumer actually consuming? I know there's an issue
> where old committed offsets expire after a period of time.
>
> On Wed, 20 Jun. 2018, 5:46 pm Shantanu Deshmukh, <sh...@gmail.com>
> wrote:
>
> > It is happening via auto-commit. Frequence is 3000 ms
> >
> > On Wed, Jun 20, 2018 at 10:31 AM Liam Clarke <li...@adscale.co.nz>
> > wrote:
> >
> > > How frequently are your consumers committing offsets?
> > >
> > > On Wed, 20 Jun. 2018, 4:52 pm Shantanu Deshmukh, <
> shantanu88d@gmail.com>
> > > wrote:
> > >
> > > > I desperately need help. Facing this issue on production since a
> while
> > > now.
> > > > Someone please help me out.
> > > >
> > > > On Fri, Jun 15, 2018 at 2:02 AM Lawrence Weikum <lweikum@pandora.com
> >
> > > > wrote:
> > > >
> > > > > unsubscribe
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Frequent "offset out of range" messages, partitions deserted by consumer

Posted by Liam Clarke <li...@adscale.co.nz>.
 How often is the consumer actually consuming? I know there's an issue
where old committed offsets expire after a period of time.

On Wed, 20 Jun. 2018, 5:46 pm Shantanu Deshmukh, <sh...@gmail.com>
wrote:

> It is happening via auto-commit. Frequence is 3000 ms
>
> On Wed, Jun 20, 2018 at 10:31 AM Liam Clarke <li...@adscale.co.nz>
> wrote:
>
> > How frequently are your consumers committing offsets?
> >
> > On Wed, 20 Jun. 2018, 4:52 pm Shantanu Deshmukh, <sh...@gmail.com>
> > wrote:
> >
> > > I desperately need help. Facing this issue on production since a while
> > now.
> > > Someone please help me out.
> > >
> > > On Fri, Jun 15, 2018 at 2:02 AM Lawrence Weikum <lw...@pandora.com>
> > > wrote:
> > >
> > > > unsubscribe
> > > >
> > > >
> > >
> >
>

Re: Frequent "offset out of range" messages, partitions deserted by consumer

Posted by Shantanu Deshmukh <sh...@gmail.com>.
It is happening via auto-commit. Frequence is 3000 ms

On Wed, Jun 20, 2018 at 10:31 AM Liam Clarke <li...@adscale.co.nz>
wrote:

> How frequently are your consumers committing offsets?
>
> On Wed, 20 Jun. 2018, 4:52 pm Shantanu Deshmukh, <sh...@gmail.com>
> wrote:
>
> > I desperately need help. Facing this issue on production since a while
> now.
> > Someone please help me out.
> >
> > On Fri, Jun 15, 2018 at 2:02 AM Lawrence Weikum <lw...@pandora.com>
> > wrote:
> >
> > > unsubscribe
> > >
> > >
> >
>

Re: Frequent "offset out of range" messages, partitions deserted by consumer

Posted by Liam Clarke <li...@adscale.co.nz>.
How frequently are your consumers committing offsets?

On Wed, 20 Jun. 2018, 4:52 pm Shantanu Deshmukh, <sh...@gmail.com>
wrote:

> I desperately need help. Facing this issue on production since a while now.
> Someone please help me out.
>
> On Fri, Jun 15, 2018 at 2:02 AM Lawrence Weikum <lw...@pandora.com>
> wrote:
>
> > unsubscribe
> >
> >
>

Re: Frequent "offset out of range" messages, partitions deserted by consumer

Posted by Shantanu Deshmukh <sh...@gmail.com>.
I desperately need help. Facing this issue on production since a while now.
Someone please help me out.

On Fri, Jun 15, 2018 at 2:02 AM Lawrence Weikum <lw...@pandora.com> wrote:

> unsubscribe
>
>

Re: Frequent "offset out of range" messages, partitions deserted by consumer

Posted by Lawrence Weikum <lw...@pandora.com>.
unsubscribe 


Re: Frequent "offset out of range" messages, partitions deserted by consumer

Posted by Shantanu Deshmukh <sh...@gmail.com>.
Any help please.

On Thu, Jun 14, 2018 at 2:39 PM Shantanu Deshmukh <sh...@gmail.com>
wrote:

> We have a consumer application which has a single consumer group
> connecting to multiple topics. We are seeing strange behaviour in consumer
> logs. With these lines
>
>  Fetch offset 1109143 is out of range for partition otp-email-4, resetting
> offset
>  Fetch offset 952168 is out of range for partition otp-email-7, resetting
> offset
>  Fetch offset 945796 is out of range for partition otp-email-5, resetting
> offset
>  Fetch offset 950900 is out of range for partition otp-email-0, resetting
> offset
>  Fetch offset 953163 is out of range for partition otp-email-3, resetting
> offset
>  Fetch offset 1118389 is out of range for partition otp-email-6, resetting
> offset
>  Fetch offset 1112177 is out of range for partition otp-email-2, resetting
> offset
>  Fetch offset 1109539 is out of range for partition otp-email-1, resetting
> offset
>
> Some time later we saw these logs
>
> [2018-06-08 19:45:28] :: INFO  :: ConsumerCoordinator:333 - Revoking
> previously assigned partitions [bulk-email-4, bulk-email-3, bulk-email-0,
> bulk-email-2, bulk-email-1] for group notifications-consumer
> [2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator:381 - (Re-)joining
> group notifications-consumer
> [2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
> joined group notifications-consumer with generation 3063
> [2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
> joined group notifications-consumer with generation 3063
> [2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
> joined group notifications-consumer with generation 3063
> [2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
> joined group notifications-consumer with generation 3063
> [2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
> joined group notifications-consumer with generation 3063
> [2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
> joined group notifications-consumer with generation 3063
> [2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
> joined group notifications-consumer with generation 3063
> [2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully
> joined group notifications-consumer with generation 3063
> [2018-06-08 19:45:28] :: INFO  :: ConsumerCoordinator:225 - Setting newly
> assigned partitions [bulk-email-8, bulk-email-7, bulk-email-9,
> bulk-email-6, bulk-email-5] for group notifications-consumer
> [2018-06-08 19:45:28] :: INFO  :: ConsumerCoordinator:225 - Setting newly
> assigned partitions [transactional-sms-3, transactional-sms-2,
> transactional-sms-1, transactional-sms-0] for group notifications-consumer
> [2018-06-08 19:45:28] :: INFO  :: ConsumerCoordinator:225 - Setting newly
> assigned partitions [transactional-sms-9, transactional-sms-8,
> transactional-sms-7] for group notifications-consumer
>
> I noticed that one of our topics was not seen in the list of *Setting
> newly assigned partitions*. Then that topic had no consumers attached to
> it for 8 hours at least. It's only when someone restarted application it
> started consuming from that topic. What can be going wrong here?
>
> Here is consumer config
>
> auto.commit.interval.ms = 3000
> auto.offset.reset = latest
> bootstrap.servers = [x.x.x.x:9092, x.x.x.x:9092, x.x.x.x:9092]
> check.crcs = true
> client.id =
> connections.max.idle.ms = 540000
> enable.auto.commit = true
> exclude.internal.topics = true
> fetch.max.bytes = 52428800
> fetch.max.wait.ms = 500
> fetch.min.bytes = 1
> group.id = otp-notifications-consumer
> heartbeat.interval.ms = 3000
> interceptor.classes = null
> key.deserializer = class
> org.apache.kafka.common.serialization.StringDeserializer
> max.partition.fetch.bytes = 1048576
> max.poll.interval.ms = 300000
> max.poll.records = 50
> metadata.max.age.ms = 300000
> metric.reporters = []
> metrics.num.samples = 2
> metrics.sample.window.ms = 30000
> partition.assignment.strategy = [class
> org.apache.kafka.clients.consumer.RangeAssignor]
> receive.buffer.bytes = 65536
> reconnect.backoff.ms = 50
> request.timeout.ms = 305000
> retry.backoff.ms = 100
> sasl.kerberos.kinit.cmd = /usr/bin/kinit
> sasl.kerberos.min.time.before.relogin = 60000
> sasl.kerberos.service.name = null
> sasl.kerberos.ticket.renew.jitter = 0.05
> sasl.kerberos.ticket.renew.window.factor = 0.8
> sasl.mechanism = GSSAPI
> security.protocol = SSL
> send.buffer.bytes = 131072
> session.timeout.ms = 300000
> ssl.cipher.suites = null
> ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
> ssl.endpoint.identification.algorithm = null
> ssl.key.password = null
> ssl.keymanager.algorithm = SunX509
> ssl.keystore.location = null
> ssl.keystore.password = null
> ssl.keystore.type = JKS
> ssl.protocol = TLS
> ssl.provider = null
> ssl.secure.random.implementation = null
> ssl.trustmanager.algorithm = PKIX
> ssl.truststore.location = /x/x/client.truststore.jks
> ssl.truststore.password = [hidden]
> ssl.truststore.type = JKS
> value.deserializer = class
> org.apache.kafka.common.serialization.StringDeserializer
>
> The topic which went orphan has 10 partitions, retention.ms=1800000,
> segment.ms=1800000.
> Please help.
>
> Thanks & Regards,
>
> Shantanu Deshmukh
>