You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by 李斯宁 <li...@gmail.com> on 2016/08/21 11:38:41 UTC

Samza container hang on exception

hi, guys
I'm using samza in realtime process. After running for about 10 hours, some
containers paused and not processing.

When I looked into the log, I found a lot of

2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
Got error produce response with correlation id 490345 on
topic-partition test3_a2_mobileDictClient_android_uid_imei-3, retrying
(17 attempts left). Error: NOT_LEADER_FOR_PARTITION
2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
Got error produce response with correlation id 490345 on
topic-partition test3_a2_mobileDictClient_android_uid_imei-4, retrying
(18 attempts left). Error: NOT_LEADER_FOR_PARTITION
2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
Got error produce response with correlation id 490345 on
topic-partition test3_a2_mobileDictClient_android_uid_imei-6, retrying
(18 attempts left). Error: NOT_LEADER_FOR_PARTITION
2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
Got error produce response with correlation id 490346 on
topic-partition test3_a2_mobileDictClient_android_uid_imei-3, retrying
(16 attempts left). Error: NOT_LEADER_FOR_PARTITION
2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
Got error produce response with correlation id 490346 on
topic-partition test3_a2_mobileDictClient_android_uid_imei-4, retrying
(17 attempts left). Error: NOT_LEADER_FOR_PARTITION
2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
Got error produce response with correlation id 490346 on
topic-partition test3_a2_mobileDictClient_android_uid_imei-6, retrying
(17 attempts left). Error: NOT_LEADER_FOR_PARTITION

...

2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2

This happens since "rush hour" for new messages produced to kafka. May
be this is a bug of kafka / samza?

kafka version: 0.10.0.0

kafka config and part of paused log are attached.

Re: Samza container hang on exception

Posted by Yi Pan <ni...@gmail.com>.
Hi, Sining,

I took a look at your log and stack traces and want to clarify two points:

1) It seems that your container actually exited, instead of hanging, based
on the log, which is the expected behavior from 0.10.1 (retry X times and
error-out in SamzaContainer RunLoop).
2) The Kafka producer client keeps getting "REQUEST_TIMEOUT" exception from
the send call. This is typically the case when your Kafka cluster is
overwhelmed. There are some known issues in Kafka broker 0.8.2 that causes
the producer stuck (KAFKA-1788). We did not get the full stack trace from
the Kafka producer client lib from your run but I suspect that might be the
issue, if you are running Kafka broker 0.8.2. I would recommend to increase
your Kafka footprint and move the broker vip to a less-loaded host to see
whether the problem goes away.

Let me know if we can be more helpful.

Thanks!

-Yi

On Fri, Sep 2, 2016 at 2:17 AM, 李斯宁 <li...@gmail.com> wrote:

> yes, upgraded to 0.10.1
>
> jstack:
> https://drive.google.com/open?id=0B19olQZ1dUO8VjltQmtxLTJ4SVdFZ
> WhYWHZ3Y2hMOVhCMWNn
> task log:
> https://drive.google.com/open?id=0B19olQZ1dUO8eVRLWmJCVl9nRlg2U
> UM4c21udUViWW8tSUVV
>
> On Fri, Sep 2, 2016 at 4:41 PM, Yi Pan <ni...@gmail.com> wrote:
>
> > Hi, Sining,
> >
> > You note is on a site that I don't have account/access and it requires
> > sign-up. Can you share it via google doc, since you have a gmail account?
> > And just to confirm, you have upgrade and using 0.10.1 now, right?
> >
> > Thanks and apologize for the delay.
> >
> > -Yi
> >
> > On Fri, Sep 2, 2016 at 1:03 AM, 李斯宁 <li...@gmail.com> wrote:
> >
> > > Can any one help on this? Thanks!
> > >
> > > On Thu, Sep 1, 2016 at 11:59 AM, 李斯宁 <li...@gmail.com> wrote:
> > >
> > > > If you cannot see the attachment, please try http://note.youdao.com/
> > > > noteshare?id=56b826c24af47a9fdb600490ce788710
> > > >
> > > > On Thu, Sep 1, 2016 at 1:50 AM, Chinmay Soman <
> > chinmay.cerebro@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> Sorry dont see anything in the attachment. Can you please re-attach
> > and
> > > >> re-send ?
> > > >>
> > > >> On Wed, Aug 31, 2016 at 3:27 AM, 李斯宁 <li...@gmail.com> wrote:
> > > >>
> > > >> > It seems upgrading does not solve the problem. All task hang in
> > > today's
> > > >> > "rush hour".
> > > >> > I attached log and jstack.
> > > >> >
> > > >> > The SAMZA-911 want to fix by stopping the process if failed too
> much
> > > >> > times.  But the process is still there and hanging.
> > > >> >
> > > >> > On Mon, Aug 22, 2016 at 1:14 PM, 李斯宁 <li...@gmail.com> wrote:
> > > >> >
> > > >> >> Thanks so much, I'll try.
> > > >> >>
> > > >> >> On Mon, Aug 22, 2016 at 6:26 AM, Yi Pan <ni...@gmail.com>
> > wrote:
> > > >> >>
> > > >> >>> Hi, Sining,
> > > >> >>>
> > > >> >>> This is a known bug that is fixed in 0.10.1 (SAMZA-911). Please
> > try
> > > to
> > > >> >>> upgrade to 0.10.1.
> > > >> >>>
> > > >> >>> Thanks!
> > > >> >>>
> > > >> >>> -Yi
> > > >> >>>
> > > >> >>> On Sun, Aug 21, 2016 at 5:55 AM, 李斯宁 <li...@gmail.com>
> wrote:
> > > >> >>>
> > > >> >>> > I have tried restart every kafka server.  The container did
> not
> > > >> >>> recover.
> > > >> >>> >
> > > >> >>> > log have something below:
> > > >> >>> >
> > > >> >>> > 2016-08-21 20:08:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > :66
> > > >> )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> > org.apache.kafka.common.errors.NotLeaderForPartitionException
> :
> > > This
> > > >> >>> server
> > > >> >>> > is not the leader for that topic-partition.. Turn on debugging
> > to
> > > >> get a
> > > >> >>> > full stack trace
> > > >> >>> > 2016-08-21 20:08:22 [WARN ](o.a.k.c.p.i.Sender
> > >  :257)
> > > >> >>> Got
> > > >> >>> > error produce response with correlation id 4364 on
> > topic-partition
> > > >> >>> > samzaMetrics-5, retrying (0 attempts left). Error:
> > > >> >>> NOT_LEADER_FOR_PARTITION
> > > >> >>> > 2016-08-21 20:08:23 [WARN ](o.a.k.c.p.i.Sender
> > >  :257)
> > > >> >>> Got
> > > >> >>> > error produce response with correlation id 4367 on
> > topic-partition
> > > >> >>> > samzaMetrics-5, retrying (29 attempts left). Error:
> > > >> >>> > NOT_LEADER_FOR_PARTITION
> > > >> >>> >
> > > >> >>> >
> > > >> >>> > jstack shows:
> > > >> >>> >
> > > >> >>> > "main" #1 prio=5 os_prio=0 tid=0x00007f1ba401a000 nid=0x1a621
> > > >> waiting
> > > >> >>> on
> > > >> >>> > condition [0x00007f1bab976000]
> > > >> >>> > java.lang.Thread.State: TIMED_WAITING (sleeping)
> > > >> >>> > at java.lang.Thread.sleep(Native Method)
> > > >> >>> > at
> > > >> >>> > org.apache.samza.util.ExponentialSleepStrategy$RetryLoopStat
> > > >> e.sleep(
> > > >> >>> > ExponentialSleepStrategy.scala:105)
> > > >> >>> > at
> > > >> >>> > org.apache.samza.util.ExponentialSleepStrategy.run(
> > > >> >>> > ExponentialSleepStrategy.scala:91)
> > > >> >>> > at
> > > >> >>> > org.apache.samza.system.kafka.KafkaSystemProducer.send(
> > > >> >>> > KafkaSystemProducer.scala:91)
> > > >> >>> > at org.apache.samza.system.SystemProducers.send(
> SystemProducers
> > > >> >>> .scala:87)
> > > >> >>> > at
> > > >> >>> > org.apache.samza.task.TaskInstanceCollector.send(
> > > >> >>> > TaskInstanceCollector.scala:61)
> > > >> >>> > at toolbox.analyzer2.realtime.CommonWriter.write(
> CommonWriter.
> > > >> java:50)
> > > >> >>> > at toolbox.analyzer2.realtime.InitTask.lambda$process$0(
> InitTas
> > > >> >>> k.java:110)
> > > >> >>> > at toolbox.analyzer2.realtime.InitTask$$Lambda$4/938405008.
> emit
> > > >> >>> (Unknown
> > > >> >>> > Source)
> > > >> >>> > at
> > > >> >>> > toolbox.analyzer2.util.core.TransToKvProcessor.process(
> > > >> >>> > TransToKvProcessor.java:146)
> > > >> >>> > at toolbox.analyzer2.realtime.InitTask$2.emit(InitTask.java:
> > 119)
> > > >> >>> > at toolbox.analyzer2.util.core.JsonExpander.expand(
> JsonExpander
> > > >> >>> .java:47)
> > > >> >>> > at toolbox.analyzer2.realtime.InitTask.process(InitTask.
> > java:128)
> > > >> >>> > at
> > > >> >>> > org.apache.samza.container.TaskInstance$$anonfun$process$
> > > >> >>> > 1.apply$mcV$sp(TaskInstance.scala:150)
> > > >> >>> > at
> > > >> >>> > org.apache.samza.container.TaskInstanceExceptionHandler.mayb
> > > >> eHandle(
> > > >> >>> > TaskInstanceExceptionHandler.scala:54)
> > > >> >>> > at org.apache.samza.container.TaskInstance.process(
> TaskInstance
> > > >> >>> .scala:149)
> > > >> >>> > at
> > > >> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
> > > >> >>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:122)
> > > >> >>> > at
> > > >> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
> > > >> >>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:119)
> > > >> >>> > at scala.collection.immutable.List.foreach(List.scala:318)
> > > >> >>> > at
> > > >> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1.
> > > >> >>> > apply$mcVJ$sp(RunLoop.scala:118)
> > > >> >>> > at
> > > >> >>> > org.apache.samza.util.TimerUtils$class.
> > updateTimerAndGetDuration(
> > > >> >>> > TimerUtils.scala:51)
> > > >> >>> > at
> > > >> >>> > org.apache.samza.container.RunLoop.updateTimerAndGetDuration(
> > > >> >>> > RunLoop.scala:35)
> > > >> >>> > at org.apache.samza.container.RunLoop.process(RunLoop.scala:
> > 106)
> > > >> >>> > at org.apache.samza.container.RunLoop.run(RunLoop.scala:74)
> > > >> >>> > at org.apache.samza.container.SamzaContainer.run(
> SamzaContainer
> > > >> >>> .scala:553)
> > > >> >>>
> > > >> >>> > at
> > > >> >>> > org.apache.samza.container.SamzaContainer$.safeMain(
> > > >> >>> > SamzaContainer.scala:92)
> > > >> >>> > at org.apache.samza.container.SamzaContainer$.main(
> > > >> >>> > SamzaContainer.scala:66)
> > > >> >>> > at org.apache.samza.container.SamzaContainer.main(
> SamzaContaine
> > > >> >>> r.scala)
> > > >> >>> >
> > > >> >>> > May be partition leader has changed in rush hour and metrics
> > > writing
> > > >> >>> method
> > > >> >>> > do not recognize that and retry again and again?
> > > >> >>> >
> > > >> >>> > Any response is appreciated :)
> > > >> >>> >
> > > >> >>> > On Sun, Aug 21, 2016 at 8:00 PM, 李斯宁 <li...@gmail.com>
> > wrote:
> > > >> >>> >
> > > >> >>> > > at the last of the container's log, prints these:
> > > >> >>> > >
> > > >> >>> > > 2016-08-21 19:57:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66 )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > > 2016-08-21 19:57:11 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66 )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > > 2016-08-21 19:57:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66 )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > > 2016-08-21 19:57:31 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66 )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > > 2016-08-21 19:57:41 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66 )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > > 2016-08-21 19:57:51 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66 )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > > 2016-08-21 19:58:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66 )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > > 2016-08-21 19:58:11 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66 )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > > 2016-08-21 19:58:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66 )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > > 2016-08-21 19:58:31 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66 )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > > 2016-08-21 19:58:41 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66 )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > > 2016-08-21 19:58:51 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66 )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > > 2016-08-21 19:59:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66 )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > >
> > > >> >>> > >
> > > >> >>> > > On Sun, Aug 21, 2016 at 7:38 PM, 李斯宁 <li...@gmail.com>
> > > wrote:
> > > >> >>> > >
> > > >> >>> > >> hi, guys
> > > >> >>> > >> I'm using samza in realtime process. After running for
> about
> > 10
> > > >> >>> hours,
> > > >> >>> > >> some containers paused and not processing.
> > > >> >>> > >>
> > > >> >>> > >> When I looked into the log, I found a lot of
> > > >> >>> > >>
> > > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> > > >>  :257)
> > > >> >>> > Got error produce response with correlation id 490345 on
> > > >> >>> topic-partition
> > > >> >>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (17
> > > attempts
> > > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> > > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> > > >>  :257)
> > > >> >>> > Got error produce response with correlation id 490345 on
> > > >> >>> topic-partition
> > > >> >>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (18
> > > attempts
> > > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> > > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> > > >>  :257)
> > > >> >>> > Got error produce response with correlation id 490345 on
> > > >> >>> topic-partition
> > > >> >>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (18
> > > attempts
> > > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> > > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> > > >>  :257)
> > > >> >>> > Got error produce response with correlation id 490346 on
> > > >> >>> topic-partition
> > > >> >>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (16
> > > attempts
> > > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> > > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> > > >>  :257)
> > > >> >>> > Got error produce response with correlation id 490346 on
> > > >> >>> topic-partition
> > > >> >>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (17
> > > attempts
> > > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> > > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> > > >>  :257)
> > > >> >>> > Got error produce response with correlation id 490346 on
> > > >> >>> topic-partition
> > > >> >>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (17
> > > attempts
> > > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> > > >> >>> > >>
> > > >> >>> > >> ...
> > > >> >>> > >>
> > > >> >>> > >> 2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66
> > > >> >>> )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > >> 2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66
> > > >> >>> )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > >> 2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66
> > > >> >>> )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > >> 2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > > >> :66
> > > >> >>> )
> > > >> >>> > Retrying send messsage due to RetriableException -
> > > >> >>> org.apache.kafka.common.
> > > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > > >> leader
> > > >> >>> for
> > > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> > trace
> > > >> >>> > >> 2
> > > >> >>> > >>
> > > >> >>> > >> This happens since "rush hour" for new messages produced to
> > > >> kafka.
> > > >> >>> May
> > > >> >>> > be this is a bug of kafka / samza?
> > > >> >>> > >>
> > > >> >>> > >> kafka version: 0.10.0.0
> > > >> >>> > >>
> > > >> >>> > >> kafka config and part of paused log are attached.
> > > >> >>> > >>
> > > >> >>> > >>
> > > >> >>> > >>
> > > >> >>> > >
> > > >> >>> > >
> > > >> >>> > > --
> > > >> >>> > > 李斯宁
> > > >> >>> > >
> > > >> >>> >
> > > >> >>> >
> > > >> >>> >
> > > >> >>> > --
> > > >> >>> > 李斯宁
> > > >> >>> >
> > > >> >>>
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >> 李斯宁
> > > >> >>
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > 李斯宁
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Thanks and regards
> > > >>
> > > >> Chinmay Soman
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > 李斯宁
> > > >
> > >
> > >
> > >
> > > --
> > > 李斯宁
> > >
> >
>
>
>
> --
> 李斯宁
>

Re: Samza container hang on exception

Posted by 李斯宁 <li...@gmail.com>.
yes, upgraded to 0.10.1

jstack:
https://drive.google.com/open?id=0B19olQZ1dUO8VjltQmtxLTJ4SVdFZWhYWHZ3Y2hMOVhCMWNn
task log:
https://drive.google.com/open?id=0B19olQZ1dUO8eVRLWmJCVl9nRlg2UUM4c21udUViWW8tSUVV

On Fri, Sep 2, 2016 at 4:41 PM, Yi Pan <ni...@gmail.com> wrote:

> Hi, Sining,
>
> You note is on a site that I don't have account/access and it requires
> sign-up. Can you share it via google doc, since you have a gmail account?
> And just to confirm, you have upgrade and using 0.10.1 now, right?
>
> Thanks and apologize for the delay.
>
> -Yi
>
> On Fri, Sep 2, 2016 at 1:03 AM, 李斯宁 <li...@gmail.com> wrote:
>
> > Can any one help on this? Thanks!
> >
> > On Thu, Sep 1, 2016 at 11:59 AM, 李斯宁 <li...@gmail.com> wrote:
> >
> > > If you cannot see the attachment, please try http://note.youdao.com/
> > > noteshare?id=56b826c24af47a9fdb600490ce788710
> > >
> > > On Thu, Sep 1, 2016 at 1:50 AM, Chinmay Soman <
> chinmay.cerebro@gmail.com
> > >
> > > wrote:
> > >
> > >> Sorry dont see anything in the attachment. Can you please re-attach
> and
> > >> re-send ?
> > >>
> > >> On Wed, Aug 31, 2016 at 3:27 AM, 李斯宁 <li...@gmail.com> wrote:
> > >>
> > >> > It seems upgrading does not solve the problem. All task hang in
> > today's
> > >> > "rush hour".
> > >> > I attached log and jstack.
> > >> >
> > >> > The SAMZA-911 want to fix by stopping the process if failed too much
> > >> > times.  But the process is still there and hanging.
> > >> >
> > >> > On Mon, Aug 22, 2016 at 1:14 PM, 李斯宁 <li...@gmail.com> wrote:
> > >> >
> > >> >> Thanks so much, I'll try.
> > >> >>
> > >> >> On Mon, Aug 22, 2016 at 6:26 AM, Yi Pan <ni...@gmail.com>
> wrote:
> > >> >>
> > >> >>> Hi, Sining,
> > >> >>>
> > >> >>> This is a known bug that is fixed in 0.10.1 (SAMZA-911). Please
> try
> > to
> > >> >>> upgrade to 0.10.1.
> > >> >>>
> > >> >>> Thanks!
> > >> >>>
> > >> >>> -Yi
> > >> >>>
> > >> >>> On Sun, Aug 21, 2016 at 5:55 AM, 李斯宁 <li...@gmail.com> wrote:
> > >> >>>
> > >> >>> > I have tried restart every kafka server.  The container did not
> > >> >>> recover.
> > >> >>> >
> > >> >>> > log have something below:
> > >> >>> >
> > >> >>> > 2016-08-21 20:08:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > :66
> > >> )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> > org.apache.kafka.common.errors.NotLeaderForPartitionException:
> > This
> > >> >>> server
> > >> >>> > is not the leader for that topic-partition.. Turn on debugging
> to
> > >> get a
> > >> >>> > full stack trace
> > >> >>> > 2016-08-21 20:08:22 [WARN ](o.a.k.c.p.i.Sender
> >  :257)
> > >> >>> Got
> > >> >>> > error produce response with correlation id 4364 on
> topic-partition
> > >> >>> > samzaMetrics-5, retrying (0 attempts left). Error:
> > >> >>> NOT_LEADER_FOR_PARTITION
> > >> >>> > 2016-08-21 20:08:23 [WARN ](o.a.k.c.p.i.Sender
> >  :257)
> > >> >>> Got
> > >> >>> > error produce response with correlation id 4367 on
> topic-partition
> > >> >>> > samzaMetrics-5, retrying (29 attempts left). Error:
> > >> >>> > NOT_LEADER_FOR_PARTITION
> > >> >>> >
> > >> >>> >
> > >> >>> > jstack shows:
> > >> >>> >
> > >> >>> > "main" #1 prio=5 os_prio=0 tid=0x00007f1ba401a000 nid=0x1a621
> > >> waiting
> > >> >>> on
> > >> >>> > condition [0x00007f1bab976000]
> > >> >>> > java.lang.Thread.State: TIMED_WAITING (sleeping)
> > >> >>> > at java.lang.Thread.sleep(Native Method)
> > >> >>> > at
> > >> >>> > org.apache.samza.util.ExponentialSleepStrategy$RetryLoopStat
> > >> e.sleep(
> > >> >>> > ExponentialSleepStrategy.scala:105)
> > >> >>> > at
> > >> >>> > org.apache.samza.util.ExponentialSleepStrategy.run(
> > >> >>> > ExponentialSleepStrategy.scala:91)
> > >> >>> > at
> > >> >>> > org.apache.samza.system.kafka.KafkaSystemProducer.send(
> > >> >>> > KafkaSystemProducer.scala:91)
> > >> >>> > at org.apache.samza.system.SystemProducers.send(SystemProducers
> > >> >>> .scala:87)
> > >> >>> > at
> > >> >>> > org.apache.samza.task.TaskInstanceCollector.send(
> > >> >>> > TaskInstanceCollector.scala:61)
> > >> >>> > at toolbox.analyzer2.realtime.CommonWriter.write(CommonWriter.
> > >> java:50)
> > >> >>> > at toolbox.analyzer2.realtime.InitTask.lambda$process$0(InitTas
> > >> >>> k.java:110)
> > >> >>> > at toolbox.analyzer2.realtime.InitTask$$Lambda$4/938405008.emit
> > >> >>> (Unknown
> > >> >>> > Source)
> > >> >>> > at
> > >> >>> > toolbox.analyzer2.util.core.TransToKvProcessor.process(
> > >> >>> > TransToKvProcessor.java:146)
> > >> >>> > at toolbox.analyzer2.realtime.InitTask$2.emit(InitTask.java:
> 119)
> > >> >>> > at toolbox.analyzer2.util.core.JsonExpander.expand(JsonExpander
> > >> >>> .java:47)
> > >> >>> > at toolbox.analyzer2.realtime.InitTask.process(InitTask.
> java:128)
> > >> >>> > at
> > >> >>> > org.apache.samza.container.TaskInstance$$anonfun$process$
> > >> >>> > 1.apply$mcV$sp(TaskInstance.scala:150)
> > >> >>> > at
> > >> >>> > org.apache.samza.container.TaskInstanceExceptionHandler.mayb
> > >> eHandle(
> > >> >>> > TaskInstanceExceptionHandler.scala:54)
> > >> >>> > at org.apache.samza.container.TaskInstance.process(TaskInstance
> > >> >>> .scala:149)
> > >> >>> > at
> > >> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
> > >> >>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:122)
> > >> >>> > at
> > >> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
> > >> >>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:119)
> > >> >>> > at scala.collection.immutable.List.foreach(List.scala:318)
> > >> >>> > at
> > >> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1.
> > >> >>> > apply$mcVJ$sp(RunLoop.scala:118)
> > >> >>> > at
> > >> >>> > org.apache.samza.util.TimerUtils$class.
> updateTimerAndGetDuration(
> > >> >>> > TimerUtils.scala:51)
> > >> >>> > at
> > >> >>> > org.apache.samza.container.RunLoop.updateTimerAndGetDuration(
> > >> >>> > RunLoop.scala:35)
> > >> >>> > at org.apache.samza.container.RunLoop.process(RunLoop.scala:
> 106)
> > >> >>> > at org.apache.samza.container.RunLoop.run(RunLoop.scala:74)
> > >> >>> > at org.apache.samza.container.SamzaContainer.run(SamzaContainer
> > >> >>> .scala:553)
> > >> >>>
> > >> >>> > at
> > >> >>> > org.apache.samza.container.SamzaContainer$.safeMain(
> > >> >>> > SamzaContainer.scala:92)
> > >> >>> > at org.apache.samza.container.SamzaContainer$.main(
> > >> >>> > SamzaContainer.scala:66)
> > >> >>> > at org.apache.samza.container.SamzaContainer.main(SamzaContaine
> > >> >>> r.scala)
> > >> >>> >
> > >> >>> > May be partition leader has changed in rush hour and metrics
> > writing
> > >> >>> method
> > >> >>> > do not recognize that and retry again and again?
> > >> >>> >
> > >> >>> > Any response is appreciated :)
> > >> >>> >
> > >> >>> > On Sun, Aug 21, 2016 at 8:00 PM, 李斯宁 <li...@gmail.com>
> wrote:
> > >> >>> >
> > >> >>> > > at the last of the container's log, prints these:
> > >> >>> > >
> > >> >>> > > 2016-08-21 19:57:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66 )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > > 2016-08-21 19:57:11 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66 )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > > 2016-08-21 19:57:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66 )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > > 2016-08-21 19:57:31 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66 )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > > 2016-08-21 19:57:41 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66 )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > > 2016-08-21 19:57:51 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66 )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > > 2016-08-21 19:58:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66 )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > > 2016-08-21 19:58:11 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66 )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > > 2016-08-21 19:58:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66 )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > > 2016-08-21 19:58:31 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66 )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > > 2016-08-21 19:58:41 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66 )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > > 2016-08-21 19:58:51 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66 )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > > 2016-08-21 19:59:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66 )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > >
> > >> >>> > >
> > >> >>> > > On Sun, Aug 21, 2016 at 7:38 PM, 李斯宁 <li...@gmail.com>
> > wrote:
> > >> >>> > >
> > >> >>> > >> hi, guys
> > >> >>> > >> I'm using samza in realtime process. After running for about
> 10
> > >> >>> hours,
> > >> >>> > >> some containers paused and not processing.
> > >> >>> > >>
> > >> >>> > >> When I looked into the log, I found a lot of
> > >> >>> > >>
> > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> > >>  :257)
> > >> >>> > Got error produce response with correlation id 490345 on
> > >> >>> topic-partition
> > >> >>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (17
> > attempts
> > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> > >>  :257)
> > >> >>> > Got error produce response with correlation id 490345 on
> > >> >>> topic-partition
> > >> >>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (18
> > attempts
> > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> > >>  :257)
> > >> >>> > Got error produce response with correlation id 490345 on
> > >> >>> topic-partition
> > >> >>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (18
> > attempts
> > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> > >>  :257)
> > >> >>> > Got error produce response with correlation id 490346 on
> > >> >>> topic-partition
> > >> >>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (16
> > attempts
> > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> > >>  :257)
> > >> >>> > Got error produce response with correlation id 490346 on
> > >> >>> topic-partition
> > >> >>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (17
> > attempts
> > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> > >>  :257)
> > >> >>> > Got error produce response with correlation id 490346 on
> > >> >>> topic-partition
> > >> >>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (17
> > attempts
> > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> > >> >>> > >>
> > >> >>> > >> ...
> > >> >>> > >>
> > >> >>> > >> 2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66
> > >> >>> )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > >> 2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66
> > >> >>> )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > >> 2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66
> > >> >>> )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > >> 2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer
> > >> :66
> > >> >>> )
> > >> >>> > Retrying send messsage due to RetriableException -
> > >> >>> org.apache.kafka.common.
> > >> >>> > errors.NotLeaderForPartitionException: This server is not the
> > >> leader
> > >> >>> for
> > >> >>> > that topic-partition.. Turn on debugging to get a full stack
> trace
> > >> >>> > >> 2
> > >> >>> > >>
> > >> >>> > >> This happens since "rush hour" for new messages produced to
> > >> kafka.
> > >> >>> May
> > >> >>> > be this is a bug of kafka / samza?
> > >> >>> > >>
> > >> >>> > >> kafka version: 0.10.0.0
> > >> >>> > >>
> > >> >>> > >> kafka config and part of paused log are attached.
> > >> >>> > >>
> > >> >>> > >>
> > >> >>> > >>
> > >> >>> > >
> > >> >>> > >
> > >> >>> > > --
> > >> >>> > > 李斯宁
> > >> >>> > >
> > >> >>> >
> > >> >>> >
> > >> >>> >
> > >> >>> > --
> > >> >>> > 李斯宁
> > >> >>> >
> > >> >>>
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> 李斯宁
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > 李斯宁
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Thanks and regards
> > >>
> > >> Chinmay Soman
> > >>
> > >
> > >
> > >
> > > --
> > > 李斯宁
> > >
> >
> >
> >
> > --
> > 李斯宁
> >
>



-- 
李斯宁

Re: Samza container hang on exception

Posted by Yi Pan <ni...@gmail.com>.
Hi, Sining,

You note is on a site that I don't have account/access and it requires
sign-up. Can you share it via google doc, since you have a gmail account?
And just to confirm, you have upgrade and using 0.10.1 now, right?

Thanks and apologize for the delay.

-Yi

On Fri, Sep 2, 2016 at 1:03 AM, 李斯宁 <li...@gmail.com> wrote:

> Can any one help on this? Thanks!
>
> On Thu, Sep 1, 2016 at 11:59 AM, 李斯宁 <li...@gmail.com> wrote:
>
> > If you cannot see the attachment, please try http://note.youdao.com/
> > noteshare?id=56b826c24af47a9fdb600490ce788710
> >
> > On Thu, Sep 1, 2016 at 1:50 AM, Chinmay Soman <chinmay.cerebro@gmail.com
> >
> > wrote:
> >
> >> Sorry dont see anything in the attachment. Can you please re-attach and
> >> re-send ?
> >>
> >> On Wed, Aug 31, 2016 at 3:27 AM, 李斯宁 <li...@gmail.com> wrote:
> >>
> >> > It seems upgrading does not solve the problem. All task hang in
> today's
> >> > "rush hour".
> >> > I attached log and jstack.
> >> >
> >> > The SAMZA-911 want to fix by stopping the process if failed too much
> >> > times.  But the process is still there and hanging.
> >> >
> >> > On Mon, Aug 22, 2016 at 1:14 PM, 李斯宁 <li...@gmail.com> wrote:
> >> >
> >> >> Thanks so much, I'll try.
> >> >>
> >> >> On Mon, Aug 22, 2016 at 6:26 AM, Yi Pan <ni...@gmail.com> wrote:
> >> >>
> >> >>> Hi, Sining,
> >> >>>
> >> >>> This is a known bug that is fixed in 0.10.1 (SAMZA-911). Please try
> to
> >> >>> upgrade to 0.10.1.
> >> >>>
> >> >>> Thanks!
> >> >>>
> >> >>> -Yi
> >> >>>
> >> >>> On Sun, Aug 21, 2016 at 5:55 AM, 李斯宁 <li...@gmail.com> wrote:
> >> >>>
> >> >>> > I have tried restart every kafka server.  The container did not
> >> >>> recover.
> >> >>> >
> >> >>> > log have something below:
> >> >>> >
> >> >>> > 2016-08-21 20:08:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66
> >> )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> > org.apache.kafka.common.errors.NotLeaderForPartitionException:
> This
> >> >>> server
> >> >>> > is not the leader for that topic-partition.. Turn on debugging to
> >> get a
> >> >>> > full stack trace
> >> >>> > 2016-08-21 20:08:22 [WARN ](o.a.k.c.p.i.Sender
>  :257)
> >> >>> Got
> >> >>> > error produce response with correlation id 4364 on topic-partition
> >> >>> > samzaMetrics-5, retrying (0 attempts left). Error:
> >> >>> NOT_LEADER_FOR_PARTITION
> >> >>> > 2016-08-21 20:08:23 [WARN ](o.a.k.c.p.i.Sender
>  :257)
> >> >>> Got
> >> >>> > error produce response with correlation id 4367 on topic-partition
> >> >>> > samzaMetrics-5, retrying (29 attempts left). Error:
> >> >>> > NOT_LEADER_FOR_PARTITION
> >> >>> >
> >> >>> >
> >> >>> > jstack shows:
> >> >>> >
> >> >>> > "main" #1 prio=5 os_prio=0 tid=0x00007f1ba401a000 nid=0x1a621
> >> waiting
> >> >>> on
> >> >>> > condition [0x00007f1bab976000]
> >> >>> > java.lang.Thread.State: TIMED_WAITING (sleeping)
> >> >>> > at java.lang.Thread.sleep(Native Method)
> >> >>> > at
> >> >>> > org.apache.samza.util.ExponentialSleepStrategy$RetryLoopStat
> >> e.sleep(
> >> >>> > ExponentialSleepStrategy.scala:105)
> >> >>> > at
> >> >>> > org.apache.samza.util.ExponentialSleepStrategy.run(
> >> >>> > ExponentialSleepStrategy.scala:91)
> >> >>> > at
> >> >>> > org.apache.samza.system.kafka.KafkaSystemProducer.send(
> >> >>> > KafkaSystemProducer.scala:91)
> >> >>> > at org.apache.samza.system.SystemProducers.send(SystemProducers
> >> >>> .scala:87)
> >> >>> > at
> >> >>> > org.apache.samza.task.TaskInstanceCollector.send(
> >> >>> > TaskInstanceCollector.scala:61)
> >> >>> > at toolbox.analyzer2.realtime.CommonWriter.write(CommonWriter.
> >> java:50)
> >> >>> > at toolbox.analyzer2.realtime.InitTask.lambda$process$0(InitTas
> >> >>> k.java:110)
> >> >>> > at toolbox.analyzer2.realtime.InitTask$$Lambda$4/938405008.emit
> >> >>> (Unknown
> >> >>> > Source)
> >> >>> > at
> >> >>> > toolbox.analyzer2.util.core.TransToKvProcessor.process(
> >> >>> > TransToKvProcessor.java:146)
> >> >>> > at toolbox.analyzer2.realtime.InitTask$2.emit(InitTask.java:119)
> >> >>> > at toolbox.analyzer2.util.core.JsonExpander.expand(JsonExpander
> >> >>> .java:47)
> >> >>> > at toolbox.analyzer2.realtime.InitTask.process(InitTask.java:128)
> >> >>> > at
> >> >>> > org.apache.samza.container.TaskInstance$$anonfun$process$
> >> >>> > 1.apply$mcV$sp(TaskInstance.scala:150)
> >> >>> > at
> >> >>> > org.apache.samza.container.TaskInstanceExceptionHandler.mayb
> >> eHandle(
> >> >>> > TaskInstanceExceptionHandler.scala:54)
> >> >>> > at org.apache.samza.container.TaskInstance.process(TaskInstance
> >> >>> .scala:149)
> >> >>> > at
> >> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
> >> >>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:122)
> >> >>> > at
> >> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
> >> >>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:119)
> >> >>> > at scala.collection.immutable.List.foreach(List.scala:318)
> >> >>> > at
> >> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1.
> >> >>> > apply$mcVJ$sp(RunLoop.scala:118)
> >> >>> > at
> >> >>> > org.apache.samza.util.TimerUtils$class.updateTimerAndGetDuration(
> >> >>> > TimerUtils.scala:51)
> >> >>> > at
> >> >>> > org.apache.samza.container.RunLoop.updateTimerAndGetDuration(
> >> >>> > RunLoop.scala:35)
> >> >>> > at org.apache.samza.container.RunLoop.process(RunLoop.scala:106)
> >> >>> > at org.apache.samza.container.RunLoop.run(RunLoop.scala:74)
> >> >>> > at org.apache.samza.container.SamzaContainer.run(SamzaContainer
> >> >>> .scala:553)
> >> >>>
> >> >>> > at
> >> >>> > org.apache.samza.container.SamzaContainer$.safeMain(
> >> >>> > SamzaContainer.scala:92)
> >> >>> > at org.apache.samza.container.SamzaContainer$.main(
> >> >>> > SamzaContainer.scala:66)
> >> >>> > at org.apache.samza.container.SamzaContainer.main(SamzaContaine
> >> >>> r.scala)
> >> >>> >
> >> >>> > May be partition leader has changed in rush hour and metrics
> writing
> >> >>> method
> >> >>> > do not recognize that and retry again and again?
> >> >>> >
> >> >>> > Any response is appreciated :)
> >> >>> >
> >> >>> > On Sun, Aug 21, 2016 at 8:00 PM, 李斯宁 <li...@gmail.com> wrote:
> >> >>> >
> >> >>> > > at the last of the container's log, prints these:
> >> >>> > >
> >> >>> > > 2016-08-21 19:57:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66 )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > > 2016-08-21 19:57:11 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66 )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > > 2016-08-21 19:57:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66 )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > > 2016-08-21 19:57:31 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66 )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > > 2016-08-21 19:57:41 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66 )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > > 2016-08-21 19:57:51 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66 )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > > 2016-08-21 19:58:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66 )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > > 2016-08-21 19:58:11 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66 )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > > 2016-08-21 19:58:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66 )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > > 2016-08-21 19:58:31 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66 )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > > 2016-08-21 19:58:41 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66 )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > > 2016-08-21 19:58:51 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66 )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > > 2016-08-21 19:59:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66 )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > >
> >> >>> > >
> >> >>> > > On Sun, Aug 21, 2016 at 7:38 PM, 李斯宁 <li...@gmail.com>
> wrote:
> >> >>> > >
> >> >>> > >> hi, guys
> >> >>> > >> I'm using samza in realtime process. After running for about 10
> >> >>> hours,
> >> >>> > >> some containers paused and not processing.
> >> >>> > >>
> >> >>> > >> When I looked into the log, I found a lot of
> >> >>> > >>
> >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> >>  :257)
> >> >>> > Got error produce response with correlation id 490345 on
> >> >>> topic-partition
> >> >>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (17
> attempts
> >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> >>  :257)
> >> >>> > Got error produce response with correlation id 490345 on
> >> >>> topic-partition
> >> >>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (18
> attempts
> >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> >>  :257)
> >> >>> > Got error produce response with correlation id 490345 on
> >> >>> topic-partition
> >> >>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (18
> attempts
> >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> >>  :257)
> >> >>> > Got error produce response with correlation id 490346 on
> >> >>> topic-partition
> >> >>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (16
> attempts
> >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> >>  :257)
> >> >>> > Got error produce response with correlation id 490346 on
> >> >>> topic-partition
> >> >>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (17
> attempts
> >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
> >>  :257)
> >> >>> > Got error produce response with correlation id 490346 on
> >> >>> topic-partition
> >> >>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (17
> attempts
> >> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> >> >>> > >>
> >> >>> > >> ...
> >> >>> > >>
> >> >>> > >> 2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66
> >> >>> )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > >> 2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66
> >> >>> )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > >> 2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66
> >> >>> )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > >> 2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer
> >> :66
> >> >>> )
> >> >>> > Retrying send messsage due to RetriableException -
> >> >>> org.apache.kafka.common.
> >> >>> > errors.NotLeaderForPartitionException: This server is not the
> >> leader
> >> >>> for
> >> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >> >>> > >> 2
> >> >>> > >>
> >> >>> > >> This happens since "rush hour" for new messages produced to
> >> kafka.
> >> >>> May
> >> >>> > be this is a bug of kafka / samza?
> >> >>> > >>
> >> >>> > >> kafka version: 0.10.0.0
> >> >>> > >>
> >> >>> > >> kafka config and part of paused log are attached.
> >> >>> > >>
> >> >>> > >>
> >> >>> > >>
> >> >>> > >
> >> >>> > >
> >> >>> > > --
> >> >>> > > 李斯宁
> >> >>> > >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > --
> >> >>> > 李斯宁
> >> >>> >
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> 李斯宁
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > 李斯宁
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks and regards
> >>
> >> Chinmay Soman
> >>
> >
> >
> >
> > --
> > 李斯宁
> >
>
>
>
> --
> 李斯宁
>

Re: Samza container hang on exception

Posted by 李斯宁 <li...@gmail.com>.
Can any one help on this? Thanks!

On Thu, Sep 1, 2016 at 11:59 AM, 李斯宁 <li...@gmail.com> wrote:

> If you cannot see the attachment, please try http://note.youdao.com/
> noteshare?id=56b826c24af47a9fdb600490ce788710
>
> On Thu, Sep 1, 2016 at 1:50 AM, Chinmay Soman <ch...@gmail.com>
> wrote:
>
>> Sorry dont see anything in the attachment. Can you please re-attach and
>> re-send ?
>>
>> On Wed, Aug 31, 2016 at 3:27 AM, 李斯宁 <li...@gmail.com> wrote:
>>
>> > It seems upgrading does not solve the problem. All task hang in today's
>> > "rush hour".
>> > I attached log and jstack.
>> >
>> > The SAMZA-911 want to fix by stopping the process if failed too much
>> > times.  But the process is still there and hanging.
>> >
>> > On Mon, Aug 22, 2016 at 1:14 PM, 李斯宁 <li...@gmail.com> wrote:
>> >
>> >> Thanks so much, I'll try.
>> >>
>> >> On Mon, Aug 22, 2016 at 6:26 AM, Yi Pan <ni...@gmail.com> wrote:
>> >>
>> >>> Hi, Sining,
>> >>>
>> >>> This is a known bug that is fixed in 0.10.1 (SAMZA-911). Please try to
>> >>> upgrade to 0.10.1.
>> >>>
>> >>> Thanks!
>> >>>
>> >>> -Yi
>> >>>
>> >>> On Sun, Aug 21, 2016 at 5:55 AM, 李斯宁 <li...@gmail.com> wrote:
>> >>>
>> >>> > I have tried restart every kafka server.  The container did not
>> >>> recover.
>> >>> >
>> >>> > log have something below:
>> >>> >
>> >>> > 2016-08-21 20:08:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66
>> )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> > org.apache.kafka.common.errors.NotLeaderForPartitionException: This
>> >>> server
>> >>> > is not the leader for that topic-partition.. Turn on debugging to
>> get a
>> >>> > full stack trace
>> >>> > 2016-08-21 20:08:22 [WARN ](o.a.k.c.p.i.Sender                 :257)
>> >>> Got
>> >>> > error produce response with correlation id 4364 on topic-partition
>> >>> > samzaMetrics-5, retrying (0 attempts left). Error:
>> >>> NOT_LEADER_FOR_PARTITION
>> >>> > 2016-08-21 20:08:23 [WARN ](o.a.k.c.p.i.Sender                 :257)
>> >>> Got
>> >>> > error produce response with correlation id 4367 on topic-partition
>> >>> > samzaMetrics-5, retrying (29 attempts left). Error:
>> >>> > NOT_LEADER_FOR_PARTITION
>> >>> >
>> >>> >
>> >>> > jstack shows:
>> >>> >
>> >>> > "main" #1 prio=5 os_prio=0 tid=0x00007f1ba401a000 nid=0x1a621
>> waiting
>> >>> on
>> >>> > condition [0x00007f1bab976000]
>> >>> > java.lang.Thread.State: TIMED_WAITING (sleeping)
>> >>> > at java.lang.Thread.sleep(Native Method)
>> >>> > at
>> >>> > org.apache.samza.util.ExponentialSleepStrategy$RetryLoopStat
>> e.sleep(
>> >>> > ExponentialSleepStrategy.scala:105)
>> >>> > at
>> >>> > org.apache.samza.util.ExponentialSleepStrategy.run(
>> >>> > ExponentialSleepStrategy.scala:91)
>> >>> > at
>> >>> > org.apache.samza.system.kafka.KafkaSystemProducer.send(
>> >>> > KafkaSystemProducer.scala:91)
>> >>> > at org.apache.samza.system.SystemProducers.send(SystemProducers
>> >>> .scala:87)
>> >>> > at
>> >>> > org.apache.samza.task.TaskInstanceCollector.send(
>> >>> > TaskInstanceCollector.scala:61)
>> >>> > at toolbox.analyzer2.realtime.CommonWriter.write(CommonWriter.
>> java:50)
>> >>> > at toolbox.analyzer2.realtime.InitTask.lambda$process$0(InitTas
>> >>> k.java:110)
>> >>> > at toolbox.analyzer2.realtime.InitTask$$Lambda$4/938405008.emit
>> >>> (Unknown
>> >>> > Source)
>> >>> > at
>> >>> > toolbox.analyzer2.util.core.TransToKvProcessor.process(
>> >>> > TransToKvProcessor.java:146)
>> >>> > at toolbox.analyzer2.realtime.InitTask$2.emit(InitTask.java:119)
>> >>> > at toolbox.analyzer2.util.core.JsonExpander.expand(JsonExpander
>> >>> .java:47)
>> >>> > at toolbox.analyzer2.realtime.InitTask.process(InitTask.java:128)
>> >>> > at
>> >>> > org.apache.samza.container.TaskInstance$$anonfun$process$
>> >>> > 1.apply$mcV$sp(TaskInstance.scala:150)
>> >>> > at
>> >>> > org.apache.samza.container.TaskInstanceExceptionHandler.mayb
>> eHandle(
>> >>> > TaskInstanceExceptionHandler.scala:54)
>> >>> > at org.apache.samza.container.TaskInstance.process(TaskInstance
>> >>> .scala:149)
>> >>> > at
>> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
>> >>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:122)
>> >>> > at
>> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
>> >>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:119)
>> >>> > at scala.collection.immutable.List.foreach(List.scala:318)
>> >>> > at
>> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1.
>> >>> > apply$mcVJ$sp(RunLoop.scala:118)
>> >>> > at
>> >>> > org.apache.samza.util.TimerUtils$class.updateTimerAndGetDuration(
>> >>> > TimerUtils.scala:51)
>> >>> > at
>> >>> > org.apache.samza.container.RunLoop.updateTimerAndGetDuration(
>> >>> > RunLoop.scala:35)
>> >>> > at org.apache.samza.container.RunLoop.process(RunLoop.scala:106)
>> >>> > at org.apache.samza.container.RunLoop.run(RunLoop.scala:74)
>> >>> > at org.apache.samza.container.SamzaContainer.run(SamzaContainer
>> >>> .scala:553)
>> >>>
>> >>> > at
>> >>> > org.apache.samza.container.SamzaContainer$.safeMain(
>> >>> > SamzaContainer.scala:92)
>> >>> > at org.apache.samza.container.SamzaContainer$.main(
>> >>> > SamzaContainer.scala:66)
>> >>> > at org.apache.samza.container.SamzaContainer.main(SamzaContaine
>> >>> r.scala)
>> >>> >
>> >>> > May be partition leader has changed in rush hour and metrics writing
>> >>> method
>> >>> > do not recognize that and retry again and again?
>> >>> >
>> >>> > Any response is appreciated :)
>> >>> >
>> >>> > On Sun, Aug 21, 2016 at 8:00 PM, 李斯宁 <li...@gmail.com> wrote:
>> >>> >
>> >>> > > at the last of the container's log, prints these:
>> >>> > >
>> >>> > > 2016-08-21 19:57:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66 )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > > 2016-08-21 19:57:11 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66 )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > > 2016-08-21 19:57:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66 )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > > 2016-08-21 19:57:31 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66 )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > > 2016-08-21 19:57:41 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66 )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > > 2016-08-21 19:57:51 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66 )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > > 2016-08-21 19:58:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66 )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > > 2016-08-21 19:58:11 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66 )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > > 2016-08-21 19:58:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66 )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > > 2016-08-21 19:58:31 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66 )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > > 2016-08-21 19:58:41 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66 )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > > 2016-08-21 19:58:51 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66 )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > > 2016-08-21 19:59:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66 )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > >
>> >>> > >
>> >>> > > On Sun, Aug 21, 2016 at 7:38 PM, 李斯宁 <li...@gmail.com> wrote:
>> >>> > >
>> >>> > >> hi, guys
>> >>> > >> I'm using samza in realtime process. After running for about 10
>> >>> hours,
>> >>> > >> some containers paused and not processing.
>> >>> > >>
>> >>> > >> When I looked into the log, I found a lot of
>> >>> > >>
>> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
>>  :257)
>> >>> > Got error produce response with correlation id 490345 on
>> >>> topic-partition
>> >>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (17 attempts
>> >>> > left). Error: NOT_LEADER_FOR_PARTITION
>> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
>>  :257)
>> >>> > Got error produce response with correlation id 490345 on
>> >>> topic-partition
>> >>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (18 attempts
>> >>> > left). Error: NOT_LEADER_FOR_PARTITION
>> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
>>  :257)
>> >>> > Got error produce response with correlation id 490345 on
>> >>> topic-partition
>> >>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (18 attempts
>> >>> > left). Error: NOT_LEADER_FOR_PARTITION
>> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
>>  :257)
>> >>> > Got error produce response with correlation id 490346 on
>> >>> topic-partition
>> >>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (16 attempts
>> >>> > left). Error: NOT_LEADER_FOR_PARTITION
>> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
>>  :257)
>> >>> > Got error produce response with correlation id 490346 on
>> >>> topic-partition
>> >>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (17 attempts
>> >>> > left). Error: NOT_LEADER_FOR_PARTITION
>> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
>>  :257)
>> >>> > Got error produce response with correlation id 490346 on
>> >>> topic-partition
>> >>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (17 attempts
>> >>> > left). Error: NOT_LEADER_FOR_PARTITION
>> >>> > >>
>> >>> > >> ...
>> >>> > >>
>> >>> > >> 2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66
>> >>> )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > >> 2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66
>> >>> )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > >> 2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66
>> >>> )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > >> 2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer
>> :66
>> >>> )
>> >>> > Retrying send messsage due to RetriableException -
>> >>> org.apache.kafka.common.
>> >>> > errors.NotLeaderForPartitionException: This server is not the
>> leader
>> >>> for
>> >>> > that topic-partition.. Turn on debugging to get a full stack trace
>> >>> > >> 2
>> >>> > >>
>> >>> > >> This happens since "rush hour" for new messages produced to
>> kafka.
>> >>> May
>> >>> > be this is a bug of kafka / samza?
>> >>> > >>
>> >>> > >> kafka version: 0.10.0.0
>> >>> > >>
>> >>> > >> kafka config and part of paused log are attached.
>> >>> > >>
>> >>> > >>
>> >>> > >>
>> >>> > >
>> >>> > >
>> >>> > > --
>> >>> > > 李斯宁
>> >>> > >
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > 李斯宁
>> >>> >
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> 李斯宁
>> >>
>> >
>> >
>> >
>> > --
>> > 李斯宁
>> >
>>
>>
>>
>> --
>> Thanks and regards
>>
>> Chinmay Soman
>>
>
>
>
> --
> 李斯宁
>



-- 
李斯宁

Re: Samza container hang on exception

Posted by 李斯宁 <li...@gmail.com>.
If you cannot see the attachment, please try
http://note.youdao.com/noteshare?id=56b826c24af47a9fdb600490ce788710

On Thu, Sep 1, 2016 at 1:50 AM, Chinmay Soman <ch...@gmail.com>
wrote:

> Sorry dont see anything in the attachment. Can you please re-attach and
> re-send ?
>
> On Wed, Aug 31, 2016 at 3:27 AM, 李斯宁 <li...@gmail.com> wrote:
>
> > It seems upgrading does not solve the problem. All task hang in today's
> > "rush hour".
> > I attached log and jstack.
> >
> > The SAMZA-911 want to fix by stopping the process if failed too much
> > times.  But the process is still there and hanging.
> >
> > On Mon, Aug 22, 2016 at 1:14 PM, 李斯宁 <li...@gmail.com> wrote:
> >
> >> Thanks so much, I'll try.
> >>
> >> On Mon, Aug 22, 2016 at 6:26 AM, Yi Pan <ni...@gmail.com> wrote:
> >>
> >>> Hi, Sining,
> >>>
> >>> This is a known bug that is fixed in 0.10.1 (SAMZA-911). Please try to
> >>> upgrade to 0.10.1.
> >>>
> >>> Thanks!
> >>>
> >>> -Yi
> >>>
> >>> On Sun, Aug 21, 2016 at 5:55 AM, 李斯宁 <li...@gmail.com> wrote:
> >>>
> >>> > I have tried restart every kafka server.  The container did not
> >>> recover.
> >>> >
> >>> > log have something below:
> >>> >
> >>> > 2016-08-21 20:08:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> >>> > Retrying send messsage due to RetriableException -
> >>> > org.apache.kafka.common.errors.NotLeaderForPartitionException: This
> >>> server
> >>> > is not the leader for that topic-partition.. Turn on debugging to
> get a
> >>> > full stack trace
> >>> > 2016-08-21 20:08:22 [WARN ](o.a.k.c.p.i.Sender                 :257)
> >>> Got
> >>> > error produce response with correlation id 4364 on topic-partition
> >>> > samzaMetrics-5, retrying (0 attempts left). Error:
> >>> NOT_LEADER_FOR_PARTITION
> >>> > 2016-08-21 20:08:23 [WARN ](o.a.k.c.p.i.Sender                 :257)
> >>> Got
> >>> > error produce response with correlation id 4367 on topic-partition
> >>> > samzaMetrics-5, retrying (29 attempts left). Error:
> >>> > NOT_LEADER_FOR_PARTITION
> >>> >
> >>> >
> >>> > jstack shows:
> >>> >
> >>> > "main" #1 prio=5 os_prio=0 tid=0x00007f1ba401a000 nid=0x1a621 waiting
> >>> on
> >>> > condition [0x00007f1bab976000]
> >>> > java.lang.Thread.State: TIMED_WAITING (sleeping)
> >>> > at java.lang.Thread.sleep(Native Method)
> >>> > at
> >>> > org.apache.samza.util.ExponentialSleepStrategy$RetryLoopState.sleep(
> >>> > ExponentialSleepStrategy.scala:105)
> >>> > at
> >>> > org.apache.samza.util.ExponentialSleepStrategy.run(
> >>> > ExponentialSleepStrategy.scala:91)
> >>> > at
> >>> > org.apache.samza.system.kafka.KafkaSystemProducer.send(
> >>> > KafkaSystemProducer.scala:91)
> >>> > at org.apache.samza.system.SystemProducers.send(SystemProducers
> >>> .scala:87)
> >>> > at
> >>> > org.apache.samza.task.TaskInstanceCollector.send(
> >>> > TaskInstanceCollector.scala:61)
> >>> > at toolbox.analyzer2.realtime.CommonWriter.write(
> CommonWriter.java:50)
> >>> > at toolbox.analyzer2.realtime.InitTask.lambda$process$0(InitTas
> >>> k.java:110)
> >>> > at toolbox.analyzer2.realtime.InitTask$$Lambda$4/938405008.emit
> >>> (Unknown
> >>> > Source)
> >>> > at
> >>> > toolbox.analyzer2.util.core.TransToKvProcessor.process(
> >>> > TransToKvProcessor.java:146)
> >>> > at toolbox.analyzer2.realtime.InitTask$2.emit(InitTask.java:119)
> >>> > at toolbox.analyzer2.util.core.JsonExpander.expand(JsonExpander
> >>> .java:47)
> >>> > at toolbox.analyzer2.realtime.InitTask.process(InitTask.java:128)
> >>> > at
> >>> > org.apache.samza.container.TaskInstance$$anonfun$process$
> >>> > 1.apply$mcV$sp(TaskInstance.scala:150)
> >>> > at
> >>> > org.apache.samza.container.TaskInstanceExceptionHandler.maybeHandle(
> >>> > TaskInstanceExceptionHandler.scala:54)
> >>> > at org.apache.samza.container.TaskInstance.process(TaskInstance
> >>> .scala:149)
> >>> > at
> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
> >>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:122)
> >>> > at
> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
> >>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:119)
> >>> > at scala.collection.immutable.List.foreach(List.scala:318)
> >>> > at
> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1.
> >>> > apply$mcVJ$sp(RunLoop.scala:118)
> >>> > at
> >>> > org.apache.samza.util.TimerUtils$class.updateTimerAndGetDuration(
> >>> > TimerUtils.scala:51)
> >>> > at
> >>> > org.apache.samza.container.RunLoop.updateTimerAndGetDuration(
> >>> > RunLoop.scala:35)
> >>> > at org.apache.samza.container.RunLoop.process(RunLoop.scala:106)
> >>> > at org.apache.samza.container.RunLoop.run(RunLoop.scala:74)
> >>> > at org.apache.samza.container.SamzaContainer.run(SamzaContainer
> >>> .scala:553)
> >>>
> >>> > at
> >>> > org.apache.samza.container.SamzaContainer$.safeMain(
> >>> > SamzaContainer.scala:92)
> >>> > at org.apache.samza.container.SamzaContainer$.main(
> >>> > SamzaContainer.scala:66)
> >>> > at org.apache.samza.container.SamzaContainer.main(SamzaContaine
> >>> r.scala)
> >>> >
> >>> > May be partition leader has changed in rush hour and metrics writing
> >>> method
> >>> > do not recognize that and retry again and again?
> >>> >
> >>> > Any response is appreciated :)
> >>> >
> >>> > On Sun, Aug 21, 2016 at 8:00 PM, 李斯宁 <li...@gmail.com> wrote:
> >>> >
> >>> > > at the last of the container's log, prints these:
> >>> > >
> >>> > > 2016-08-21 19:57:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66 )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > > 2016-08-21 19:57:11 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66 )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > > 2016-08-21 19:57:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66 )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > > 2016-08-21 19:57:31 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66 )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > > 2016-08-21 19:57:41 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66 )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > > 2016-08-21 19:57:51 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66 )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > > 2016-08-21 19:58:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66 )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > > 2016-08-21 19:58:11 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66 )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > > 2016-08-21 19:58:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66 )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > > 2016-08-21 19:58:31 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66 )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > > 2016-08-21 19:58:41 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66 )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > > 2016-08-21 19:58:51 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66 )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > > 2016-08-21 19:59:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66 )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > >
> >>> > >
> >>> > > On Sun, Aug 21, 2016 at 7:38 PM, 李斯宁 <li...@gmail.com> wrote:
> >>> > >
> >>> > >> hi, guys
> >>> > >> I'm using samza in realtime process. After running for about 10
> >>> hours,
> >>> > >> some containers paused and not processing.
> >>> > >>
> >>> > >> When I looked into the log, I found a lot of
> >>> > >>
> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
>  :257)
> >>> > Got error produce response with correlation id 490345 on
> >>> topic-partition
> >>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (17 attempts
> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
>  :257)
> >>> > Got error produce response with correlation id 490345 on
> >>> topic-partition
> >>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (18 attempts
> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
>  :257)
> >>> > Got error produce response with correlation id 490345 on
> >>> topic-partition
> >>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (18 attempts
> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
>  :257)
> >>> > Got error produce response with correlation id 490346 on
> >>> topic-partition
> >>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (16 attempts
> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
>  :257)
> >>> > Got error produce response with correlation id 490346 on
> >>> topic-partition
> >>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (17 attempts
> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender
>  :257)
> >>> > Got error produce response with correlation id 490346 on
> >>> topic-partition
> >>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (17 attempts
> >>> > left). Error: NOT_LEADER_FOR_PARTITION
> >>> > >>
> >>> > >> ...
> >>> > >>
> >>> > >> 2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66
> >>> )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > >> 2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66
> >>> )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > >> 2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66
> >>> )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > >> 2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer
> :66
> >>> )
> >>> > Retrying send messsage due to RetriableException -
> >>> org.apache.kafka.common.
> >>> > errors.NotLeaderForPartitionException: This server is not the leader
> >>> for
> >>> > that topic-partition.. Turn on debugging to get a full stack trace
> >>> > >> 2
> >>> > >>
> >>> > >> This happens since "rush hour" for new messages produced to kafka.
> >>> May
> >>> > be this is a bug of kafka / samza?
> >>> > >>
> >>> > >> kafka version: 0.10.0.0
> >>> > >>
> >>> > >> kafka config and part of paused log are attached.
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >
> >>> > >
> >>> > > --
> >>> > > 李斯宁
> >>> > >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > 李斯宁
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >> 李斯宁
> >>
> >
> >
> >
> > --
> > 李斯宁
> >
>
>
>
> --
> Thanks and regards
>
> Chinmay Soman
>



-- 
李斯宁

Re: Samza container hang on exception

Posted by Chinmay Soman <ch...@gmail.com>.
Sorry dont see anything in the attachment. Can you please re-attach and
re-send ?

On Wed, Aug 31, 2016 at 3:27 AM, 李斯宁 <li...@gmail.com> wrote:

> It seems upgrading does not solve the problem. All task hang in today's
> "rush hour".
> I attached log and jstack.
>
> The SAMZA-911 want to fix by stopping the process if failed too much
> times.  But the process is still there and hanging.
>
> On Mon, Aug 22, 2016 at 1:14 PM, 李斯宁 <li...@gmail.com> wrote:
>
>> Thanks so much, I'll try.
>>
>> On Mon, Aug 22, 2016 at 6:26 AM, Yi Pan <ni...@gmail.com> wrote:
>>
>>> Hi, Sining,
>>>
>>> This is a known bug that is fixed in 0.10.1 (SAMZA-911). Please try to
>>> upgrade to 0.10.1.
>>>
>>> Thanks!
>>>
>>> -Yi
>>>
>>> On Sun, Aug 21, 2016 at 5:55 AM, 李斯宁 <li...@gmail.com> wrote:
>>>
>>> > I have tried restart every kafka server.  The container did not
>>> recover.
>>> >
>>> > log have something below:
>>> >
>>> > 2016-08-21 20:08:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> > org.apache.kafka.common.errors.NotLeaderForPartitionException: This
>>> server
>>> > is not the leader for that topic-partition.. Turn on debugging to get a
>>> > full stack trace
>>> > 2016-08-21 20:08:22 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> Got
>>> > error produce response with correlation id 4364 on topic-partition
>>> > samzaMetrics-5, retrying (0 attempts left). Error:
>>> NOT_LEADER_FOR_PARTITION
>>> > 2016-08-21 20:08:23 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> Got
>>> > error produce response with correlation id 4367 on topic-partition
>>> > samzaMetrics-5, retrying (29 attempts left). Error:
>>> > NOT_LEADER_FOR_PARTITION
>>> >
>>> >
>>> > jstack shows:
>>> >
>>> > "main" #1 prio=5 os_prio=0 tid=0x00007f1ba401a000 nid=0x1a621 waiting
>>> on
>>> > condition [0x00007f1bab976000]
>>> > java.lang.Thread.State: TIMED_WAITING (sleeping)
>>> > at java.lang.Thread.sleep(Native Method)
>>> > at
>>> > org.apache.samza.util.ExponentialSleepStrategy$RetryLoopState.sleep(
>>> > ExponentialSleepStrategy.scala:105)
>>> > at
>>> > org.apache.samza.util.ExponentialSleepStrategy.run(
>>> > ExponentialSleepStrategy.scala:91)
>>> > at
>>> > org.apache.samza.system.kafka.KafkaSystemProducer.send(
>>> > KafkaSystemProducer.scala:91)
>>> > at org.apache.samza.system.SystemProducers.send(SystemProducers
>>> .scala:87)
>>> > at
>>> > org.apache.samza.task.TaskInstanceCollector.send(
>>> > TaskInstanceCollector.scala:61)
>>> > at toolbox.analyzer2.realtime.CommonWriter.write(CommonWriter.java:50)
>>> > at toolbox.analyzer2.realtime.InitTask.lambda$process$0(InitTas
>>> k.java:110)
>>> > at toolbox.analyzer2.realtime.InitTask$$Lambda$4/938405008.emit
>>> (Unknown
>>> > Source)
>>> > at
>>> > toolbox.analyzer2.util.core.TransToKvProcessor.process(
>>> > TransToKvProcessor.java:146)
>>> > at toolbox.analyzer2.realtime.InitTask$2.emit(InitTask.java:119)
>>> > at toolbox.analyzer2.util.core.JsonExpander.expand(JsonExpander
>>> .java:47)
>>> > at toolbox.analyzer2.realtime.InitTask.process(InitTask.java:128)
>>> > at
>>> > org.apache.samza.container.TaskInstance$$anonfun$process$
>>> > 1.apply$mcV$sp(TaskInstance.scala:150)
>>> > at
>>> > org.apache.samza.container.TaskInstanceExceptionHandler.maybeHandle(
>>> > TaskInstanceExceptionHandler.scala:54)
>>> > at org.apache.samza.container.TaskInstance.process(TaskInstance
>>> .scala:149)
>>> > at
>>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
>>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:122)
>>> > at
>>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
>>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:119)
>>> > at scala.collection.immutable.List.foreach(List.scala:318)
>>> > at
>>> > org.apache.samza.container.RunLoop$$anonfun$process$1.
>>> > apply$mcVJ$sp(RunLoop.scala:118)
>>> > at
>>> > org.apache.samza.util.TimerUtils$class.updateTimerAndGetDuration(
>>> > TimerUtils.scala:51)
>>> > at
>>> > org.apache.samza.container.RunLoop.updateTimerAndGetDuration(
>>> > RunLoop.scala:35)
>>> > at org.apache.samza.container.RunLoop.process(RunLoop.scala:106)
>>> > at org.apache.samza.container.RunLoop.run(RunLoop.scala:74)
>>> > at org.apache.samza.container.SamzaContainer.run(SamzaContainer
>>> .scala:553)
>>>
>>> > at
>>> > org.apache.samza.container.SamzaContainer$.safeMain(
>>> > SamzaContainer.scala:92)
>>> > at org.apache.samza.container.SamzaContainer$.main(
>>> > SamzaContainer.scala:66)
>>> > at org.apache.samza.container.SamzaContainer.main(SamzaContaine
>>> r.scala)
>>> >
>>> > May be partition leader has changed in rush hour and metrics writing
>>> method
>>> > do not recognize that and retry again and again?
>>> >
>>> > Any response is appreciated :)
>>> >
>>> > On Sun, Aug 21, 2016 at 8:00 PM, 李斯宁 <li...@gmail.com> wrote:
>>> >
>>> > > at the last of the container's log, prints these:
>>> > >
>>> > > 2016-08-21 19:57:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:57:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:57:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:57:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:57:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:57:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:58:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:58:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:58:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:58:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:58:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:58:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:59:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > >
>>> > >
>>> > > On Sun, Aug 21, 2016 at 7:38 PM, 李斯宁 <li...@gmail.com> wrote:
>>> > >
>>> > >> hi, guys
>>> > >> I'm using samza in realtime process. After running for about 10
>>> hours,
>>> > >> some containers paused and not processing.
>>> > >>
>>> > >> When I looked into the log, I found a lot of
>>> > >>
>>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> > Got error produce response with correlation id 490345 on
>>> topic-partition
>>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (17 attempts
>>> > left). Error: NOT_LEADER_FOR_PARTITION
>>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> > Got error produce response with correlation id 490345 on
>>> topic-partition
>>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (18 attempts
>>> > left). Error: NOT_LEADER_FOR_PARTITION
>>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> > Got error produce response with correlation id 490345 on
>>> topic-partition
>>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (18 attempts
>>> > left). Error: NOT_LEADER_FOR_PARTITION
>>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> > Got error produce response with correlation id 490346 on
>>> topic-partition
>>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (16 attempts
>>> > left). Error: NOT_LEADER_FOR_PARTITION
>>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> > Got error produce response with correlation id 490346 on
>>> topic-partition
>>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (17 attempts
>>> > left). Error: NOT_LEADER_FOR_PARTITION
>>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> > Got error produce response with correlation id 490346 on
>>> topic-partition
>>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (17 attempts
>>> > left). Error: NOT_LEADER_FOR_PARTITION
>>> > >>
>>> > >> ...
>>> > >>
>>> > >> 2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66
>>> )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > >> 2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66
>>> )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > >> 2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66
>>> )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > >> 2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66
>>> )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > >> 2
>>> > >>
>>> > >> This happens since "rush hour" for new messages produced to kafka.
>>> May
>>> > be this is a bug of kafka / samza?
>>> > >>
>>> > >> kafka version: 0.10.0.0
>>> > >>
>>> > >> kafka config and part of paused log are attached.
>>> > >>
>>> > >>
>>> > >>
>>> > >
>>> > >
>>> > > --
>>> > > 李斯宁
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > 李斯宁
>>> >
>>>
>>
>>
>>
>> --
>> 李斯宁
>>
>
>
>
> --
> 李斯宁
>



-- 
Thanks and regards

Chinmay Soman

Re: Samza container hang on exception

Posted by 李斯宁 <li...@gmail.com>.
It seems upgrading does not solve the problem. All task hang in today's
"rush hour".
I attached log and jstack.

The SAMZA-911 want to fix by stopping the process if failed too much
times.  But the process is still there and hanging.

On Mon, Aug 22, 2016 at 1:14 PM, 李斯宁 <li...@gmail.com> wrote:

> Thanks so much, I'll try.
>
> On Mon, Aug 22, 2016 at 6:26 AM, Yi Pan <ni...@gmail.com> wrote:
>
>> Hi, Sining,
>>
>> This is a known bug that is fixed in 0.10.1 (SAMZA-911). Please try to
>> upgrade to 0.10.1.
>>
>> Thanks!
>>
>> -Yi
>>
>> On Sun, Aug 21, 2016 at 5:55 AM, 李斯宁 <li...@gmail.com> wrote:
>>
>> > I have tried restart every kafka server.  The container did not recover.
>> >
>> > log have something below:
>> >
>> > 2016-08-21 20:08:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> > org.apache.kafka.common.errors.NotLeaderForPartitionException: This
>> server
>> > is not the leader for that topic-partition.. Turn on debugging to get a
>> > full stack trace
>> > 2016-08-21 20:08:22 [WARN ](o.a.k.c.p.i.Sender                 :257) Got
>> > error produce response with correlation id 4364 on topic-partition
>> > samzaMetrics-5, retrying (0 attempts left). Error:
>> NOT_LEADER_FOR_PARTITION
>> > 2016-08-21 20:08:23 [WARN ](o.a.k.c.p.i.Sender                 :257) Got
>> > error produce response with correlation id 4367 on topic-partition
>> > samzaMetrics-5, retrying (29 attempts left). Error:
>> > NOT_LEADER_FOR_PARTITION
>> >
>> >
>> > jstack shows:
>> >
>> > "main" #1 prio=5 os_prio=0 tid=0x00007f1ba401a000 nid=0x1a621 waiting on
>> > condition [0x00007f1bab976000]
>> > java.lang.Thread.State: TIMED_WAITING (sleeping)
>> > at java.lang.Thread.sleep(Native Method)
>> > at
>> > org.apache.samza.util.ExponentialSleepStrategy$RetryLoopState.sleep(
>> > ExponentialSleepStrategy.scala:105)
>> > at
>> > org.apache.samza.util.ExponentialSleepStrategy.run(
>> > ExponentialSleepStrategy.scala:91)
>> > at
>> > org.apache.samza.system.kafka.KafkaSystemProducer.send(
>> > KafkaSystemProducer.scala:91)
>> > at org.apache.samza.system.SystemProducers.send(SystemProducers
>> .scala:87)
>> > at
>> > org.apache.samza.task.TaskInstanceCollector.send(
>> > TaskInstanceCollector.scala:61)
>> > at toolbox.analyzer2.realtime.CommonWriter.write(CommonWriter.java:50)
>> > at toolbox.analyzer2.realtime.InitTask.lambda$process$0(InitTas
>> k.java:110)
>> > at toolbox.analyzer2.realtime.InitTask$$Lambda$4/938405008.emit(Unknown
>> > Source)
>> > at
>> > toolbox.analyzer2.util.core.TransToKvProcessor.process(
>> > TransToKvProcessor.java:146)
>> > at toolbox.analyzer2.realtime.InitTask$2.emit(InitTask.java:119)
>> > at toolbox.analyzer2.util.core.JsonExpander.expand(JsonExpander
>> .java:47)
>> > at toolbox.analyzer2.realtime.InitTask.process(InitTask.java:128)
>> > at
>> > org.apache.samza.container.TaskInstance$$anonfun$process$
>> > 1.apply$mcV$sp(TaskInstance.scala:150)
>> > at
>> > org.apache.samza.container.TaskInstanceExceptionHandler.maybeHandle(
>> > TaskInstanceExceptionHandler.scala:54)
>> > at org.apache.samza.container.TaskInstance.process(TaskInstance
>> .scala:149)
>> > at
>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:122)
>> > at
>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:119)
>> > at scala.collection.immutable.List.foreach(List.scala:318)
>> > at
>> > org.apache.samza.container.RunLoop$$anonfun$process$1.
>> > apply$mcVJ$sp(RunLoop.scala:118)
>> > at
>> > org.apache.samza.util.TimerUtils$class.updateTimerAndGetDuration(
>> > TimerUtils.scala:51)
>> > at
>> > org.apache.samza.container.RunLoop.updateTimerAndGetDuration(
>> > RunLoop.scala:35)
>> > at org.apache.samza.container.RunLoop.process(RunLoop.scala:106)
>> > at org.apache.samza.container.RunLoop.run(RunLoop.scala:74)
>> > at org.apache.samza.container.SamzaContainer.run(SamzaContainer
>> .scala:553)
>> > at
>> > org.apache.samza.container.SamzaContainer$.safeMain(
>> > SamzaContainer.scala:92)
>> > at org.apache.samza.container.SamzaContainer$.main(
>> > SamzaContainer.scala:66)
>> > at org.apache.samza.container.SamzaContainer.main(SamzaContainer.scala)
>> >
>> > May be partition leader has changed in rush hour and metrics writing
>> method
>> > do not recognize that and retry again and again?
>> >
>> > Any response is appreciated :)
>> >
>> > On Sun, Aug 21, 2016 at 8:00 PM, 李斯宁 <li...@gmail.com> wrote:
>> >
>> > > at the last of the container's log, prints these:
>> > >
>> > > 2016-08-21 19:57:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > > 2016-08-21 19:57:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > > 2016-08-21 19:57:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > > 2016-08-21 19:57:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > > 2016-08-21 19:57:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > > 2016-08-21 19:57:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > > 2016-08-21 19:58:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > > 2016-08-21 19:58:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > > 2016-08-21 19:58:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > > 2016-08-21 19:58:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > > 2016-08-21 19:58:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > > 2016-08-21 19:58:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > > 2016-08-21 19:59:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > >
>> > >
>> > > On Sun, Aug 21, 2016 at 7:38 PM, 李斯宁 <li...@gmail.com> wrote:
>> > >
>> > >> hi, guys
>> > >> I'm using samza in realtime process. After running for about 10
>> hours,
>> > >> some containers paused and not processing.
>> > >>
>> > >> When I looked into the log, I found a lot of
>> > >>
>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>> > Got error produce response with correlation id 490345 on topic-partition
>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (17 attempts
>> > left). Error: NOT_LEADER_FOR_PARTITION
>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>> > Got error produce response with correlation id 490345 on topic-partition
>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (18 attempts
>> > left). Error: NOT_LEADER_FOR_PARTITION
>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>> > Got error produce response with correlation id 490345 on topic-partition
>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (18 attempts
>> > left). Error: NOT_LEADER_FOR_PARTITION
>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>> > Got error produce response with correlation id 490346 on topic-partition
>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (16 attempts
>> > left). Error: NOT_LEADER_FOR_PARTITION
>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>> > Got error produce response with correlation id 490346 on topic-partition
>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (17 attempts
>> > left). Error: NOT_LEADER_FOR_PARTITION
>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>> > Got error produce response with correlation id 490346 on topic-partition
>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (17 attempts
>> > left). Error: NOT_LEADER_FOR_PARTITION
>> > >>
>> > >> ...
>> > >>
>> > >> 2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > >> 2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > >> 2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > >> 2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>> > Retrying send messsage due to RetriableException -
>> org.apache.kafka.common.
>> > errors.NotLeaderForPartitionException: This server is not the leader
>> for
>> > that topic-partition.. Turn on debugging to get a full stack trace
>> > >> 2
>> > >>
>> > >> This happens since "rush hour" for new messages produced to kafka.
>> May
>> > be this is a bug of kafka / samza?
>> > >>
>> > >> kafka version: 0.10.0.0
>> > >>
>> > >> kafka config and part of paused log are attached.
>> > >>
>> > >>
>> > >>
>> > >
>> > >
>> > > --
>> > > 李斯宁
>> > >
>> >
>> >
>> >
>> > --
>> > 李斯宁
>> >
>>
>
>
>
> --
> 李斯宁
>



-- 
李斯宁

Re: Samza container hang on exception

Posted by 李斯宁 <li...@gmail.com>.
Thanks so much, I'll try.

On Mon, Aug 22, 2016 at 6:26 AM, Yi Pan <ni...@gmail.com> wrote:

> Hi, Sining,
>
> This is a known bug that is fixed in 0.10.1 (SAMZA-911). Please try to
> upgrade to 0.10.1.
>
> Thanks!
>
> -Yi
>
> On Sun, Aug 21, 2016 at 5:55 AM, 李斯宁 <li...@gmail.com> wrote:
>
> > I have tried restart every kafka server.  The container did not recover.
> >
> > log have something below:
> >
> > 2016-08-21 20:08:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> > org.apache.kafka.common.errors.NotLeaderForPartitionException: This
> server
> > is not the leader for that topic-partition.. Turn on debugging to get a
> > full stack trace
> > 2016-08-21 20:08:22 [WARN ](o.a.k.c.p.i.Sender                 :257) Got
> > error produce response with correlation id 4364 on topic-partition
> > samzaMetrics-5, retrying (0 attempts left). Error:
> NOT_LEADER_FOR_PARTITION
> > 2016-08-21 20:08:23 [WARN ](o.a.k.c.p.i.Sender                 :257) Got
> > error produce response with correlation id 4367 on topic-partition
> > samzaMetrics-5, retrying (29 attempts left). Error:
> > NOT_LEADER_FOR_PARTITION
> >
> >
> > jstack shows:
> >
> > "main" #1 prio=5 os_prio=0 tid=0x00007f1ba401a000 nid=0x1a621 waiting on
> > condition [0x00007f1bab976000]
> > java.lang.Thread.State: TIMED_WAITING (sleeping)
> > at java.lang.Thread.sleep(Native Method)
> > at
> > org.apache.samza.util.ExponentialSleepStrategy$RetryLoopState.sleep(
> > ExponentialSleepStrategy.scala:105)
> > at
> > org.apache.samza.util.ExponentialSleepStrategy.run(
> > ExponentialSleepStrategy.scala:91)
> > at
> > org.apache.samza.system.kafka.KafkaSystemProducer.send(
> > KafkaSystemProducer.scala:91)
> > at org.apache.samza.system.SystemProducers.send(
> SystemProducers.scala:87)
> > at
> > org.apache.samza.task.TaskInstanceCollector.send(
> > TaskInstanceCollector.scala:61)
> > at toolbox.analyzer2.realtime.CommonWriter.write(CommonWriter.java:50)
> > at toolbox.analyzer2.realtime.InitTask.lambda$process$0(
> InitTask.java:110)
> > at toolbox.analyzer2.realtime.InitTask$$Lambda$4/938405008.emit(Unknown
> > Source)
> > at
> > toolbox.analyzer2.util.core.TransToKvProcessor.process(
> > TransToKvProcessor.java:146)
> > at toolbox.analyzer2.realtime.InitTask$2.emit(InitTask.java:119)
> > at toolbox.analyzer2.util.core.JsonExpander.expand(JsonExpander.java:47)
> > at toolbox.analyzer2.realtime.InitTask.process(InitTask.java:128)
> > at
> > org.apache.samza.container.TaskInstance$$anonfun$process$
> > 1.apply$mcV$sp(TaskInstance.scala:150)
> > at
> > org.apache.samza.container.TaskInstanceExceptionHandler.maybeHandle(
> > TaskInstanceExceptionHandler.scala:54)
> > at org.apache.samza.container.TaskInstance.process(
> TaskInstance.scala:149)
> > at
> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:122)
> > at
> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:119)
> > at scala.collection.immutable.List.foreach(List.scala:318)
> > at
> > org.apache.samza.container.RunLoop$$anonfun$process$1.
> > apply$mcVJ$sp(RunLoop.scala:118)
> > at
> > org.apache.samza.util.TimerUtils$class.updateTimerAndGetDuration(
> > TimerUtils.scala:51)
> > at
> > org.apache.samza.container.RunLoop.updateTimerAndGetDuration(
> > RunLoop.scala:35)
> > at org.apache.samza.container.RunLoop.process(RunLoop.scala:106)
> > at org.apache.samza.container.RunLoop.run(RunLoop.scala:74)
> > at org.apache.samza.container.SamzaContainer.run(
> SamzaContainer.scala:553)
> > at
> > org.apache.samza.container.SamzaContainer$.safeMain(
> > SamzaContainer.scala:92)
> > at org.apache.samza.container.SamzaContainer$.main(
> > SamzaContainer.scala:66)
> > at org.apache.samza.container.SamzaContainer.main(SamzaContainer.scala)
> >
> > May be partition leader has changed in rush hour and metrics writing
> method
> > do not recognize that and retry again and again?
> >
> > Any response is appreciated :)
> >
> > On Sun, Aug 21, 2016 at 8:00 PM, 李斯宁 <li...@gmail.com> wrote:
> >
> > > at the last of the container's log, prints these:
> > >
> > > 2016-08-21 19:57:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > > 2016-08-21 19:57:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > > 2016-08-21 19:57:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > > 2016-08-21 19:57:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > > 2016-08-21 19:57:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > > 2016-08-21 19:57:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > > 2016-08-21 19:58:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > > 2016-08-21 19:58:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > > 2016-08-21 19:58:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > > 2016-08-21 19:58:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > > 2016-08-21 19:58:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > > 2016-08-21 19:58:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > > 2016-08-21 19:59:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > >
> > >
> > > On Sun, Aug 21, 2016 at 7:38 PM, 李斯宁 <li...@gmail.com> wrote:
> > >
> > >> hi, guys
> > >> I'm using samza in realtime process. After running for about 10 hours,
> > >> some containers paused and not processing.
> > >>
> > >> When I looked into the log, I found a lot of
> > >>
> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
> > Got error produce response with correlation id 490345 on topic-partition
> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (17 attempts
> > left). Error: NOT_LEADER_FOR_PARTITION
> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
> > Got error produce response with correlation id 490345 on topic-partition
> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (18 attempts
> > left). Error: NOT_LEADER_FOR_PARTITION
> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
> > Got error produce response with correlation id 490345 on topic-partition
> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (18 attempts
> > left). Error: NOT_LEADER_FOR_PARTITION
> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
> > Got error produce response with correlation id 490346 on topic-partition
> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (16 attempts
> > left). Error: NOT_LEADER_FOR_PARTITION
> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
> > Got error produce response with correlation id 490346 on topic-partition
> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (17 attempts
> > left). Error: NOT_LEADER_FOR_PARTITION
> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
> > Got error produce response with correlation id 490346 on topic-partition
> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (17 attempts
> > left). Error: NOT_LEADER_FOR_PARTITION
> > >>
> > >> ...
> > >>
> > >> 2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > >> 2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > >> 2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > >> 2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> > Retrying send messsage due to RetriableException -
> org.apache.kafka.common.
> > errors.NotLeaderForPartitionException: This server is not the leader for
> > that topic-partition.. Turn on debugging to get a full stack trace
> > >> 2
> > >>
> > >> This happens since "rush hour" for new messages produced to kafka. May
> > be this is a bug of kafka / samza?
> > >>
> > >> kafka version: 0.10.0.0
> > >>
> > >> kafka config and part of paused log are attached.
> > >>
> > >>
> > >>
> > >
> > >
> > > --
> > > 李斯宁
> > >
> >
> >
> >
> > --
> > 李斯宁
> >
>



-- 
李斯宁

Re: Samza container hang on exception

Posted by Yi Pan <ni...@gmail.com>.
Hi, Sining,

This is a known bug that is fixed in 0.10.1 (SAMZA-911). Please try to
upgrade to 0.10.1.

Thanks!

-Yi

On Sun, Aug 21, 2016 at 5:55 AM, 李斯宁 <li...@gmail.com> wrote:

> I have tried restart every kafka server.  The container did not recover.
>
> log have something below:
>
> 2016-08-21 20:08:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException -
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server
> is not the leader for that topic-partition.. Turn on debugging to get a
> full stack trace
> 2016-08-21 20:08:22 [WARN ](o.a.k.c.p.i.Sender                 :257) Got
> error produce response with correlation id 4364 on topic-partition
> samzaMetrics-5, retrying (0 attempts left). Error: NOT_LEADER_FOR_PARTITION
> 2016-08-21 20:08:23 [WARN ](o.a.k.c.p.i.Sender                 :257) Got
> error produce response with correlation id 4367 on topic-partition
> samzaMetrics-5, retrying (29 attempts left). Error:
> NOT_LEADER_FOR_PARTITION
>
>
> jstack shows:
>
> "main" #1 prio=5 os_prio=0 tid=0x00007f1ba401a000 nid=0x1a621 waiting on
> condition [0x00007f1bab976000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at
> org.apache.samza.util.ExponentialSleepStrategy$RetryLoopState.sleep(
> ExponentialSleepStrategy.scala:105)
> at
> org.apache.samza.util.ExponentialSleepStrategy.run(
> ExponentialSleepStrategy.scala:91)
> at
> org.apache.samza.system.kafka.KafkaSystemProducer.send(
> KafkaSystemProducer.scala:91)
> at org.apache.samza.system.SystemProducers.send(SystemProducers.scala:87)
> at
> org.apache.samza.task.TaskInstanceCollector.send(
> TaskInstanceCollector.scala:61)
> at toolbox.analyzer2.realtime.CommonWriter.write(CommonWriter.java:50)
> at toolbox.analyzer2.realtime.InitTask.lambda$process$0(InitTask.java:110)
> at toolbox.analyzer2.realtime.InitTask$$Lambda$4/938405008.emit(Unknown
> Source)
> at
> toolbox.analyzer2.util.core.TransToKvProcessor.process(
> TransToKvProcessor.java:146)
> at toolbox.analyzer2.realtime.InitTask$2.emit(InitTask.java:119)
> at toolbox.analyzer2.util.core.JsonExpander.expand(JsonExpander.java:47)
> at toolbox.analyzer2.realtime.InitTask.process(InitTask.java:128)
> at
> org.apache.samza.container.TaskInstance$$anonfun$process$
> 1.apply$mcV$sp(TaskInstance.scala:150)
> at
> org.apache.samza.container.TaskInstanceExceptionHandler.maybeHandle(
> TaskInstanceExceptionHandler.scala:54)
> at org.apache.samza.container.TaskInstance.process(TaskInstance.scala:149)
> at
> org.apache.samza.container.RunLoop$$anonfun$process$1$$
> anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:122)
> at
> org.apache.samza.container.RunLoop$$anonfun$process$1$$
> anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:119)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at
> org.apache.samza.container.RunLoop$$anonfun$process$1.
> apply$mcVJ$sp(RunLoop.scala:118)
> at
> org.apache.samza.util.TimerUtils$class.updateTimerAndGetDuration(
> TimerUtils.scala:51)
> at
> org.apache.samza.container.RunLoop.updateTimerAndGetDuration(
> RunLoop.scala:35)
> at org.apache.samza.container.RunLoop.process(RunLoop.scala:106)
> at org.apache.samza.container.RunLoop.run(RunLoop.scala:74)
> at org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:553)
> at
> org.apache.samza.container.SamzaContainer$.safeMain(
> SamzaContainer.scala:92)
> at org.apache.samza.container.SamzaContainer$.main(
> SamzaContainer.scala:66)
> at org.apache.samza.container.SamzaContainer.main(SamzaContainer.scala)
>
> May be partition leader has changed in rush hour and metrics writing method
> do not recognize that and retry again and again?
>
> Any response is appreciated :)
>
> On Sun, Aug 21, 2016 at 8:00 PM, 李斯宁 <li...@gmail.com> wrote:
>
> > at the last of the container's log, prints these:
> >
> > 2016-08-21 19:57:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> > 2016-08-21 19:57:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> > 2016-08-21 19:57:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> > 2016-08-21 19:57:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> > 2016-08-21 19:57:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> > 2016-08-21 19:57:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> > 2016-08-21 19:58:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> > 2016-08-21 19:58:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> > 2016-08-21 19:58:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> > 2016-08-21 19:58:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> > 2016-08-21 19:58:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> > 2016-08-21 19:58:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> > 2016-08-21 19:59:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> >
> >
> > On Sun, Aug 21, 2016 at 7:38 PM, 李斯宁 <li...@gmail.com> wrote:
> >
> >> hi, guys
> >> I'm using samza in realtime process. After running for about 10 hours,
> >> some containers paused and not processing.
> >>
> >> When I looked into the log, I found a lot of
> >>
> >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
> Got error produce response with correlation id 490345 on topic-partition
> test3_a2_mobileDictClient_android_uid_imei-3, retrying (17 attempts
> left). Error: NOT_LEADER_FOR_PARTITION
> >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
> Got error produce response with correlation id 490345 on topic-partition
> test3_a2_mobileDictClient_android_uid_imei-4, retrying (18 attempts
> left). Error: NOT_LEADER_FOR_PARTITION
> >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
> Got error produce response with correlation id 490345 on topic-partition
> test3_a2_mobileDictClient_android_uid_imei-6, retrying (18 attempts
> left). Error: NOT_LEADER_FOR_PARTITION
> >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
> Got error produce response with correlation id 490346 on topic-partition
> test3_a2_mobileDictClient_android_uid_imei-3, retrying (16 attempts
> left). Error: NOT_LEADER_FOR_PARTITION
> >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
> Got error produce response with correlation id 490346 on topic-partition
> test3_a2_mobileDictClient_android_uid_imei-4, retrying (17 attempts
> left). Error: NOT_LEADER_FOR_PARTITION
> >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
> Got error produce response with correlation id 490346 on topic-partition
> test3_a2_mobileDictClient_android_uid_imei-6, retrying (17 attempts
> left). Error: NOT_LEADER_FOR_PARTITION
> >>
> >> ...
> >>
> >> 2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> >> 2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> >> 2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> >> 2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
> Retrying send messsage due to RetriableException - org.apache.kafka.common.
> errors.NotLeaderForPartitionException: This server is not the leader for
> that topic-partition.. Turn on debugging to get a full stack trace
> >> 2
> >>
> >> This happens since "rush hour" for new messages produced to kafka. May
> be this is a bug of kafka / samza?
> >>
> >> kafka version: 0.10.0.0
> >>
> >> kafka config and part of paused log are attached.
> >>
> >>
> >>
> >
> >
> > --
> > 李斯宁
> >
>
>
>
> --
> 李斯宁
>

Re: Samza container hang on exception

Posted by 李斯宁 <li...@gmail.com>.
I have tried restart every kafka server.  The container did not recover.

log have something below:

2016-08-21 20:08:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server
is not the leader for that topic-partition.. Turn on debugging to get a
full stack trace
2016-08-21 20:08:22 [WARN ](o.a.k.c.p.i.Sender                 :257) Got
error produce response with correlation id 4364 on topic-partition
samzaMetrics-5, retrying (0 attempts left). Error: NOT_LEADER_FOR_PARTITION
2016-08-21 20:08:23 [WARN ](o.a.k.c.p.i.Sender                 :257) Got
error produce response with correlation id 4367 on topic-partition
samzaMetrics-5, retrying (29 attempts left). Error: NOT_LEADER_FOR_PARTITION


jstack shows:

"main" #1 prio=5 os_prio=0 tid=0x00007f1ba401a000 nid=0x1a621 waiting on
condition [0x00007f1bab976000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.samza.util.ExponentialSleepStrategy$RetryLoopState.sleep(ExponentialSleepStrategy.scala:105)
at
org.apache.samza.util.ExponentialSleepStrategy.run(ExponentialSleepStrategy.scala:91)
at
org.apache.samza.system.kafka.KafkaSystemProducer.send(KafkaSystemProducer.scala:91)
at org.apache.samza.system.SystemProducers.send(SystemProducers.scala:87)
at
org.apache.samza.task.TaskInstanceCollector.send(TaskInstanceCollector.scala:61)
at toolbox.analyzer2.realtime.CommonWriter.write(CommonWriter.java:50)
at toolbox.analyzer2.realtime.InitTask.lambda$process$0(InitTask.java:110)
at toolbox.analyzer2.realtime.InitTask$$Lambda$4/938405008.emit(Unknown
Source)
at
toolbox.analyzer2.util.core.TransToKvProcessor.process(TransToKvProcessor.java:146)
at toolbox.analyzer2.realtime.InitTask$2.emit(InitTask.java:119)
at toolbox.analyzer2.util.core.JsonExpander.expand(JsonExpander.java:47)
at toolbox.analyzer2.realtime.InitTask.process(InitTask.java:128)
at
org.apache.samza.container.TaskInstance$$anonfun$process$1.apply$mcV$sp(TaskInstance.scala:150)
at
org.apache.samza.container.TaskInstanceExceptionHandler.maybeHandle(TaskInstanceExceptionHandler.scala:54)
at org.apache.samza.container.TaskInstance.process(TaskInstance.scala:149)
at
org.apache.samza.container.RunLoop$$anonfun$process$1$$anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:122)
at
org.apache.samza.container.RunLoop$$anonfun$process$1$$anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:119)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.apache.samza.container.RunLoop$$anonfun$process$1.apply$mcVJ$sp(RunLoop.scala:118)
at
org.apache.samza.util.TimerUtils$class.updateTimerAndGetDuration(TimerUtils.scala:51)
at
org.apache.samza.container.RunLoop.updateTimerAndGetDuration(RunLoop.scala:35)
at org.apache.samza.container.RunLoop.process(RunLoop.scala:106)
at org.apache.samza.container.RunLoop.run(RunLoop.scala:74)
at org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:553)
at
org.apache.samza.container.SamzaContainer$.safeMain(SamzaContainer.scala:92)
at org.apache.samza.container.SamzaContainer$.main(SamzaContainer.scala:66)
at org.apache.samza.container.SamzaContainer.main(SamzaContainer.scala)

May be partition leader has changed in rush hour and metrics writing method
do not recognize that and retry again and again?

Any response is appreciated :)

On Sun, Aug 21, 2016 at 8:00 PM, 李斯宁 <li...@gmail.com> wrote:

> at the last of the container's log, prints these:
>
> 2016-08-21 19:57:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2016-08-21 19:57:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2016-08-21 19:57:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2016-08-21 19:57:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2016-08-21 19:57:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2016-08-21 19:57:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2016-08-21 19:58:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2016-08-21 19:58:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2016-08-21 19:58:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2016-08-21 19:58:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2016-08-21 19:58:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2016-08-21 19:58:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2016-08-21 19:59:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
>
>
> On Sun, Aug 21, 2016 at 7:38 PM, 李斯宁 <li...@gmail.com> wrote:
>
>> hi, guys
>> I'm using samza in realtime process. After running for about 10 hours,
>> some containers paused and not processing.
>>
>> When I looked into the log, I found a lot of
>>
>> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got error produce response with correlation id 490345 on topic-partition test3_a2_mobileDictClient_android_uid_imei-3, retrying (17 attempts left). Error: NOT_LEADER_FOR_PARTITION
>> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got error produce response with correlation id 490345 on topic-partition test3_a2_mobileDictClient_android_uid_imei-4, retrying (18 attempts left). Error: NOT_LEADER_FOR_PARTITION
>> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got error produce response with correlation id 490345 on topic-partition test3_a2_mobileDictClient_android_uid_imei-6, retrying (18 attempts left). Error: NOT_LEADER_FOR_PARTITION
>> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got error produce response with correlation id 490346 on topic-partition test3_a2_mobileDictClient_android_uid_imei-3, retrying (16 attempts left). Error: NOT_LEADER_FOR_PARTITION
>> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got error produce response with correlation id 490346 on topic-partition test3_a2_mobileDictClient_android_uid_imei-4, retrying (17 attempts left). Error: NOT_LEADER_FOR_PARTITION
>> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got error produce response with correlation id 490346 on topic-partition test3_a2_mobileDictClient_android_uid_imei-6, retrying (17 attempts left). Error: NOT_LEADER_FOR_PARTITION
>>
>> ...
>>
>> 2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
>> 2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
>> 2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
>> 2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
>> 2
>>
>> This happens since "rush hour" for new messages produced to kafka. May be this is a bug of kafka / samza?
>>
>> kafka version: 0.10.0.0
>>
>> kafka config and part of paused log are attached.
>>
>>
>>
>
>
> --
> 李斯宁
>



-- 
李斯宁

Re: Samza container hang on exception

Posted by 李斯宁 <li...@gmail.com>.
at the last of the container's log, prints these:

2016-08-21 19:57:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2016-08-21 19:57:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2016-08-21 19:57:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2016-08-21 19:57:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2016-08-21 19:57:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2016-08-21 19:57:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2016-08-21 19:58:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2016-08-21 19:58:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2016-08-21 19:58:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2016-08-21 19:58:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2016-08-21 19:58:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2016-08-21 19:58:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace
2016-08-21 19:59:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.. Turn on debugging
to get a full stack trace


On Sun, Aug 21, 2016 at 7:38 PM, 李斯宁 <li...@gmail.com> wrote:

> hi, guys
> I'm using samza in realtime process. After running for about 10 hours,
> some containers paused and not processing.
>
> When I looked into the log, I found a lot of
>
> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got error produce response with correlation id 490345 on topic-partition test3_a2_mobileDictClient_android_uid_imei-3, retrying (17 attempts left). Error: NOT_LEADER_FOR_PARTITION
> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got error produce response with correlation id 490345 on topic-partition test3_a2_mobileDictClient_android_uid_imei-4, retrying (18 attempts left). Error: NOT_LEADER_FOR_PARTITION
> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got error produce response with correlation id 490345 on topic-partition test3_a2_mobileDictClient_android_uid_imei-6, retrying (18 attempts left). Error: NOT_LEADER_FOR_PARTITION
> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got error produce response with correlation id 490346 on topic-partition test3_a2_mobileDictClient_android_uid_imei-3, retrying (16 attempts left). Error: NOT_LEADER_FOR_PARTITION
> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got error produce response with correlation id 490346 on topic-partition test3_a2_mobileDictClient_android_uid_imei-4, retrying (17 attempts left). Error: NOT_LEADER_FOR_PARTITION
> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got error produce response with correlation id 490346 on topic-partition test3_a2_mobileDictClient_android_uid_imei-6, retrying (17 attempts left). Error: NOT_LEADER_FOR_PARTITION
>
> ...
>
> 2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying send messsage due to RetriableException - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Turn on debugging to get a full stack trace
> 2
>
> This happens since "rush hour" for new messages produced to kafka. May be this is a bug of kafka / samza?
>
> kafka version: 0.10.0.0
>
> kafka config and part of paused log are attached.
>
>
>


-- 
李斯宁