You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Vadim Keylis <vk...@gmail.com> on 2013/08/27 07:12:43 UTC

Instances became unresponsive

Somehow I am getting my instances of kafka to crash. I started kafka
instances one by one and they started successfully. Later it some how two
of 3 instances became completely unresponsive. The process is running, but
connnection over jmx or taking heat dump not possible. The last one some
what resposnive.
I am not sure how server get to this state. Is there anything I can monitor
to predict instances about to crash. What are ways to recover without data
loss? What am I doing wrong to get to this state. Please advise.
I poke around error logs on hosts that are not responsive and here are the
errors I found. One that I have not listed LeaderNotFoundExceotion.

 The most puzzling is about zookeeper as it was not redeployed or updated.
[2013-08-26 12:14:35,357] ERROR [KafkaApi-5] Error while fetching metadata
for partition [self_reactivation,0] (kafka.server.KafkaApis)
kafka.common.ReplicaNotAvailableException
        at
kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:471)
        at
kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:456)
        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
        at
scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
        at scala.collection.immutable.List.foreach(List.scala:76)
        at
scala.collection.TraversableLike$class.map(TraversableLike.scala:233)


in server.log
[2013-08-26 21:00:51,942] ERROR Conditional update of path
/brokers/topics/meetme/partitions/12/state with data {
"controller_epoch":6, "isr":[ 5 ], "leader":5, "leader_epoch":1,
"version":1 } and expected version 2 failed due to
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
BadVersion for /brokers/topics/meetme/partitions/12/state
(kafka.utils.ZkUtils$)
[2013-08-26 21:00:51,943] INFO Partition [meetme,12] on broker 5: Cached
zkVersion [2] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
[2013-08-26 21:00:51,990] INFO Partition [meetme,4] on broker 5: Shrinking
ISR for partition [meetme,4] from 5,4 to 5 (kafka.cluster.Partition)
[2013-08-26 21:00:51,993] ERROR Conditional update of path
/brokers/topics/meetme/partitions/4/state with data { "controller_epoch":6,
"isr":[ 5 ], "leader":5, "leader_epoch":1, "version":1 } and expected
version 2 failed due to
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
BadVersion for /brokers/topics/meetme/partitions/4/state
(kafka.utils.ZkUtils$)
[2013-08-26 21:00:51,993] INFO Partition [meetme,4] on broker 5: Cached
zkVersion [2] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
[2013-08-26 21:00:52,103] INFO Partition [meetme,6] on broker 5: Shrinking
ISR for partition [meetme,6] from 5,4 to 5 (kafka.cluster.Partition)
[2013-08-26 21:00:52,107] ERROR Conditional update of path
/brokers/topics/meetme/partitions/6/state with data { "controller_epoch":6,
"isr":[ 5 ], "leader":5, "leader_epoch":2, "version":1 } and expected
version 3 failed due to
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
BadVersion for /brokers/topics/meetme/partitions/6/state
(kafka.utils.ZkUtils$)
[2013-08-26 21:00:52,107] INFO Partition [meetme,6] on broker 5: Cached
zkVersion [3] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)

Re: Instances became unresponsive

Posted by Jun Rao <ju...@gmail.com>.
Vadim,

Without knowing the original cause, it's hard to for me to say how to
recover from it or prevent it from happening. If you stop all producers and
restart the whole cluster, does that bring the cluster to a healthy state?

Going forward, I recommend that you add monitoring of the brokers and keep
the log4j logs for a few days. This way, if the problem shows up again, we
can see which broker first has the problem and what's causing it.

Thanks,

Jun


On Tue, Aug 27, 2013 at 11:50 PM, Vadim Keylis <vk...@gmail.com>wrote:

> Hello Jun. Unfortunately I do not have logs from broker 6 to find out
> reasons for it to be unresponsive, but yes it was not healthy. I found it
> to be unresponsive as well.
> How can I recover from all this failures with minimum data loss?
>
>
>
>
> On Tue, Aug 27, 2013 at 8:51 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > It seems the replica fetch thread died because of socket timeout
> (defaults
> > to 30 secs). Was broker 6 healthy at that point?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Aug 27, 2013 at 11:36 AM, Vadim Keylis <vkeylis2009@gmail.com
> > >wrote:
> >
> > > We do not use controlled shutdown through JMX, its configured in the
> > > property file. I do not see control shutdown message at the time I
> > > initiated the shutdown. However searching for the string produced the
> > > following error messages which happened hours before I started shutting
> > > down service.
> > >
> > > [2013-08-26 12:38:18,850] WARN [ReplicaFetcherThread--1-6], Error in
> > fetch
> > > Name: FetchRequest; Version: 0; CorrelationId: 1541; ClientId:
> > > ReplicaFetcherThread--1-6; ReplicaId: 5; MaxWait: 500 ms; MinBytes: 1
> > > bytes; RequestInfo: [pets_pageview,25] ->
> > > PartitionFetchInfo(0,1048576),[mm_msg,26] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashruns_spin,22] ->
> > > PartitionFetchInfo(0,1048576),[page_timings,5] ->
> > > PartitionFetchInfo(0,1048576),[cafe_purchases,13] ->
> > > PartitionFetchInfo(0,1048576),[meetme_spotlight_action,33] ->
> > > PartitionFetchInfo(0,1048576),[mob_gift,5] ->
> > > PartitionFetchInfo(0,1048576),[cafe_coin_purchases,18] ->
> > > PartitionFetchInfo(0,1048576),[security_trigger,11] ->
> > > PartitionFetchInfo(0,1048576),[pysm_click,30] ->
> > > PartitionFetchInfo(0,1048576),[gold_blacklist,21] ->
> > > PartitionFetchInfo(0,1048576),[meetme_oops,9] ->
> > > PartitionFetchInfo(0,1048576),[m3_session_info,15] ->
> > > PartitionFetchInfo(0,1048576),[link_review,10] ->
> > > PartitionFetchInfo(0,1048576),[cafe_debug,28] ->
> > > PartitionFetchInfo(0,1048576),[m3_login_button,0] ->
> > > PartitionFetchInfo(0,1048576),[pets_level,24] ->
> > > PartitionFetchInfo(0,1048576),[login_detail,9] ->
> > > PartitionFetchInfo(0,1048576),[click_mail,31] ->
> > > PartitionFetchInfo(0,1048576),[pets_wish,15] ->
> > > PartitionFetchInfo(0,1048576),[page_view_admin,6] ->
> > > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,9] ->
> > > PartitionFetchInfo(0,1048576),[pets_wish,24] ->
> > > PartitionFetchInfo(0,1048576),[cafe_food_spoiled,23] ->
> > > PartitionFetchInfo(0,1048576),[pets_wish,33] ->
> > > PartitionFetchInfo(0,1048576),[account_notifications,11] ->
> > > PartitionFetchInfo(0,1048576),[google_transactions,7] ->
> > > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,3] ->
> > > PartitionFetchInfo(0,1048576),[pets_economy_change,15] ->
> > > PartitionFetchInfo(0,1048576),[payment,13] ->
> > > PartitionFetchInfo(0,1048576),[validation,0] ->
> > > PartitionFetchInfo(0,1048576),[meetme_new_contact_count,5] ->
> > > PartitionFetchInfo(0,1048576),[mail_send,18] ->
> > > PartitionFetchInfo(0,1048576),[lightbox_click,28] ->
> > > PartitionFetchInfo(0,1048576),[rso_scanner_append,5] ->
> > > PartitionFetchInfo(0,1048576),[mob_gift,2] ->
> > > PartitionFetchInfo(0,1048576),[cafe_waiter_tips,24] ->
> > > PartitionFetchInfo(0,1048576),[groups_user_actions,22] ->
> > > PartitionFetchInfo(0,1048576),[jstiming,23] ->
> > > PartitionFetchInfo(0,1048576),[viral_contact_inviters,26] ->
> > > PartitionFetchInfo(0,1048576),[meetme_matches_tab,24] ->
> > > PartitionFetchInfo(0,1048576),[promotions_targeted_actions,19] ->
> > > PartitionFetchInfo(0,1048576),[reg_check,15] ->
> > > PartitionFetchInfo(0,1048576),[cafe_waiters_quit,9] ->
> > > PartitionFetchInfo(0,1048576),[groups_search,28] ->
> > > PartitionFetchInfo(0,1048576),[meetme_auto_message_send,15] ->
> > > PartitionFetchInfo(0,1048576),[virtual_gift,21] ->
> > > PartitionFetchInfo(0,1048576),[groupchat_room,21] ->
> > > PartitionFetchInfo(0,1048576),[rapleaf_inter,10] ->
> > > PartitionFetchInfo(0,1048576),[reg_check,18] ->
> > > PartitionFetchInfo(0,1048576),[viral_contact_inviters,29] ->
> > > PartitionFetchInfo(0,1048576),[mail_send,21] ->
> > > PartitionFetchInfo(0,1048576),[pets_economy_change,0] ->
> > > PartitionFetchInfo(0,1048576),[pets_classifier_test,19] ->
> > > PartitionFetchInfo(0,1048576),[cafe_goals,10] ->
> > > PartitionFetchInfo(0,1048576),[pets_browse,9] ->
> > > PartitionFetchInfo(0,1048576),[pets_achievements,24] ->
> > > PartitionFetchInfo(0,1048576),[content_review_preprocess,31] ->
> > > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,12] ->
> > > PartitionFetchInfo(0,1048576),[link_render,16] ->
> > > PartitionFetchInfo(0,1048576),[ent_wt_winners,2] ->
> > > PartitionFetchInfo(0,1048576),[meetme,23] ->
> > > PartitionFetchInfo(1178,1048576),[gwallet_notification,30] ->
> > > PartitionFetchInfo(0,1048576),[m3_login_button,18] ->
> > > PartitionFetchInfo(0,1048576),[mobile_logout,3] ->
> > > PartitionFetchInfo(0,1048576),[trialpay_notification,30] ->
> > > PartitionFetchInfo(0,1048576),[sponsorpay_notification,35] ->
> > > PartitionFetchInfo(0,1048576),[groupchat_room,6] ->
> > > PartitionFetchInfo(0,1048576),[pets_reload,30] ->
> > > PartitionFetchInfo(0,1048576),[report_source,1] ->
> > > PartitionFetchInfo(0,1048576),[pets_nec_buygold,21] ->
> > > PartitionFetchInfo(0,1048576),[set_primary_photo,13] ->
> > > PartitionFetchInfo(0,1048576),[ent_wt_lb_sort,2] ->
> > > PartitionFetchInfo(0,1048576),[video_shares,24] ->
> > > PartitionFetchInfo(0,1048576),[viral_contact_importers,13] ->
> > > PartitionFetchInfo(0,1048576),[click_mail,13] ->
> > > PartitionFetchInfo(0,1048576),[pets_achievements,6] ->
> > > PartitionFetchInfo(0,1048576),[newsfeed_likes,10] ->
> > > PartitionFetchInfo(0,1048576),[pets_achievements,12] ->
> > > PartitionFetchInfo(0,1048576),[registration_email_domain_scan,31] ->
> > > PartitionFetchInfo(0,1048576),[photoviewer_message_click,19] ->
> > > PartitionFetchInfo(0,1048576),[registration_email_domain_scan,1] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashruns_conf,6] ->
> > > PartitionFetchInfo(0,1048576),[meetme_headliners_purchase,28] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashflow,33] ->
> > > PartitionFetchInfo(0,1048576),[top8_feat_users,6] ->
> > > PartitionFetchInfo(0,1048576),[http_headers,29] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashruns_purchase,24] ->
> > > PartitionFetchInfo(0,1048576),[payment,4] ->
> > > PartitionFetchInfo(0,1048576),[talk_recommend_click,13] ->
> > > PartitionFetchInfo(0,1048576),[mob_help,3] ->
> > > PartitionFetchInfo(0,1048576),[auto_cancel,6] ->
> > > PartitionFetchInfo(0,1048576),[bouncemail,32] ->
> > > PartitionFetchInfo(0,1048576),[user_support,23] ->
> > > PartitionFetchInfo(0,1048576),[groups_search,34] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashruns_spin,16] ->
> > > PartitionFetchInfo(0,1048576),[trialpay_notification,21] ->
> > > PartitionFetchInfo(0,1048576),[lightbox_view,6] ->
> > > PartitionFetchInfo(0,1048576),[gwallet_notification,21] ->
> > > PartitionFetchInfo(0,1048576),[cafe_unspoils,20] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashflow,9] ->
> > > PartitionFetchInfo(0,1048576),[promotions_targeted_actions,34] ->
> > > PartitionFetchInfo(0,1048576),[chat_login,4] ->
> > > PartitionFetchInfo(0,1048576),[google_transactions,28] ->
> > > PartitionFetchInfo(0,1048576),[meetme_new_contact_count,11] ->
> > > PartitionFetchInfo(0,1048576),[meetme_spotlight_purchase,28] ->
> > > PartitionFetchInfo(0,1048576),[validation_email_change,24] ->
> > > PartitionFetchInfo(0,1048576),[meetme_oops,30] ->
> > > PartitionFetchInfo(0,1048576),[pets_reset_achievement,18] ->
> > > PartitionFetchInfo(0,1048576),[hi5_blast_photocount,32] ->
> > > PartitionFetchInfo(0,1048576),[security_trigger,5] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashruns_winners,32] ->
> > > PartitionFetchInfo(0,1048576),[validation,24] ->
> > > PartitionFetchInfo(0,1048576),[lightbox_click,31] ->
> > > PartitionFetchInfo(0,1048576),[viral_landing_page,7] ->
> > > PartitionFetchInfo(0,1048576),[mobile_performance,26] ->
> > > PartitionFetchInfo(0,1048576),[viral_reg_pv,28] ->
> > > PartitionFetchInfo(0,1048576),[statictiming,8] ->
> > > PartitionFetchInfo(0,1048576),[hi5_blast_photocount,20] ->
> > > PartitionFetchInfo(0,1048576),[mobile_logout,12] ->
> > > PartitionFetchInfo(0,1048576),[validation_lightbox_action,5] ->
> > > PartitionFetchInfo(0,1048576),[cafe_food_served,10] ->
> > > PartitionFetchInfo(0,1048576),[content_checkboxes,4] ->
> > > PartitionFetchInfo(0,1048576),[meetme_matches_tab,27] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashruns_purchase,27] ->
> > > PartitionFetchInfo(0,1048576),[auto_cancel,27] ->
> > > PartitionFetchInfo(0,1048576),[pets_reset_achievement,9] ->
> > > PartitionFetchInfo(0,1048576),[m3_login_button,9] ->
> > > PartitionFetchInfo(0,1048576),[chat_login,16] ->
> > > PartitionFetchInfo(0,1048576),[rapleaf_inter,13] ->
> > > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,6] ->
> > > PartitionFetchInfo(0,1048576),[link_review,1] ->
> > > PartitionFetchInfo(0,1048576),[itunes_notification,6] ->
> > > PartitionFetchInfo(0,1048576),[m3_session_info,24] ->
> > > PartitionFetchInfo(0,1048576),[account_notifications,23] ->
> > > PartitionFetchInfo(0,1048576),[photoviewer_message_click,16] ->
> > > PartitionFetchInfo(0,1048576),[pets_economy_change,18] ->
> > > PartitionFetchInfo(0,1048576),[cafe_coin_purchases,9] ->
> > > PartitionFetchInfo(0,1048576),[virtual_gift,3] ->
> > > PartitionFetchInfo(0,1048576),[complains,8] ->
> > > PartitionFetchInfo(0,1048576),[pets_purchased,20] ->
> > > PartitionFetchInfo(0,1048576),[cafe_waiter_tips,18] ->
> > > PartitionFetchInfo(0,1048576),[rso_scanner_append,23] ->
> > > PartitionFetchInfo(0,1048576),[viral_contact_importers,4] ->
> > > PartitionFetchInfo(0,1048576),[viral_contact_inviters,17] ->
> > > PartitionFetchInfo(0,1048576),[meetme_matches_tab,21] ->
> > > PartitionFetchInfo(0,1048576),[viral_invite,31] ->
> > > PartitionFetchInfo(0,1048576),[page_view_api,13] ->
> > > PartitionFetchInfo(1143,1048576),[mm_msg,17] ->
> > > PartitionFetchInfo(0,1048576),[cancel,22] ->
> > > PartitionFetchInfo(0,1048576),[mobile_performance,20] ->
> > > PartitionFetchInfo(0,1048576),[set_primary_photo,22] ->
> > > PartitionFetchInfo(0,1048576),[registration_email_domain_scan,10] ->
> > > PartitionFetchInfo(0,1048576),[mob_properties,28] ->
> > > PartitionFetchInfo(0,1048576),[viral_registration,22] ->
> > > PartitionFetchInfo(0,1048576),[user_contact,9] ->
> > > PartitionFetchInfo(0,1048576),[rapleaf_image,2] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashruns_spin,25] ->
> > > PartitionFetchInfo(0,1048576),[cafe_debug,7] ->
> > > PartitionFetchInfo(0,1048576),[mobile_app_activities,8] ->
> > > PartitionFetchInfo(0,1048576),[cafe_food_spoiled,20] ->
> > > PartitionFetchInfo(0,1048576),[validation,9] ->
> > > PartitionFetchInfo(0,1048576),[pets_achievements,15] ->
> > > PartitionFetchInfo(0,1048576),[gwallet_notification,3] ->
> > > PartitionFetchInfo(0,1048576),[index_pageview,33] ->
> > > PartitionFetchInfo(0,1048576),[security_trigger,23] ->
> > > PartitionFetchInfo(0,1048576),[link_render,22] ->
> > > PartitionFetchInfo(0,1048576),[lightbox_click,19] ->
> > > PartitionFetchInfo(0,1048576),[cafe_waiter_tips,30] ->
> > > PartitionFetchInfo(0,1048576),[questions,13] ->
> > > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,15] ->
> > > PartitionFetchInfo(0,1048576),[phoneflow,31] ->
> > > PartitionFetchInfo(0,1048576),[pets_buyback_lightbox,18] ->
> > > PartitionFetchInfo(0,1048576),[mob_premium,6] ->
> > > PartitionFetchInfo(0,1048576),[cafe_unspoils,5] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashruns_spin,34] ->
> > > PartitionFetchInfo(0,1048576),[viral_reg_pv,16] ->
> > > PartitionFetchInfo(0,1048576),[http_headers,23] ->
> > > PartitionFetchInfo(0,1048576),[viral_landing_page,10] ->
> > > PartitionFetchInfo(0,1048576),[mob_properties,13] ->
> > > PartitionFetchInfo(0,1048576),[mobile_api,13] ->
> > > PartitionFetchInfo(0,1048576),[pets_purchased,32] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashruns_bid,22] ->
> > > PartitionFetchInfo(0,1048576),[photo_view,22] ->
> > > PartitionFetchInfo(0,1048576),[viral_import_conf_pv,18] ->
> > > PartitionFetchInfo(0,1048576),[mob_viral_job,28] ->
> > > PartitionFetchInfo(0,1048576),[newsfeed_filter_change,22] ->
> > > PartitionFetchInfo(0,1048576),[password_change,12] ->
> > > PartitionFetchInfo(0,1048576),[ent_wt_winners,23] ->
> > > PartitionFetchInfo(0,1048576),[m3_account_cancel,15] ->
> > > PartitionFetchInfo(0,1048576),[meetme_auto_message_send,6] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashflow,27] ->
> > > PartitionFetchInfo(0,1048576),[rapleaf_inter,31] ->
> > > PartitionFetchInfo(0,1048576),[user_contact,33] ->
> > > PartitionFetchInfo(0,1048576),[validation_email_change,0] ->
> > > PartitionFetchInfo(0,1048576),[pets_classifier_pysb,18] ->
> > > PartitionFetchInfo(0,1048576),[http_headers,14] ->
> > > PartitionFetchInfo(0,1048576),[meetme_matches_tab,30] ->
> > > PartitionFetchInfo(0,1048576),[m3_user_block,23] ->
> > > PartitionFetchInfo(0,1048576),[mob_premium,9] ->
> > > PartitionFetchInfo(0,1048576),[page_view_api,19] ->
> > > PartitionFetchInfo(1125,1048576),[photoviewer_like,15] ->
> > > PartitionFetchInfo(0,1048576),[meetme_oops,27] ->
> > > PartitionFetchInfo(0,1048576),[photoviewer_message_click,1] ->
> > > PartitionFetchInfo(0,1048576),[page_view_admin,27] ->
> > > PartitionFetchInfo(0,1048576),[cafe_waiters_quit,6] ->
> > > PartitionFetchInfo(0,1048576),[mob_premium,15] ->
> > > PartitionFetchInfo(0,1048576),[itunes_notification,15] ->
> > > PartitionFetchInfo(0,1048576),[phoneflow,22] ->
> > > PartitionFetchInfo(0,1048576),[auto_cancel,30] ->
> > > PartitionFetchInfo(0,1048576),[skins_library,33] ->
> > > PartitionFetchInfo(0,1048576),[hi5_blast_photocount,2] ->
> > > PartitionFetchInfo(0,1048576),[security_trigger,8] ->
> > > PartitionFetchInfo(0,1048576),[test_assign,0] ->
> > > PartitionFetchInfo(0,1048576),[test_assign,9] ->
> > > PartitionFetchInfo(0,1048576),[cafe_badge,3] ->
> > > PartitionFetchInfo(0,1048576),[mob_properties,31] ->
> > > PartitionFetchInfo(0,1048576),[pets_classifier_bp,7] ->
> > > PartitionFetchInfo(0,1048576),[chat_login,13] ->
> > > PartitionFetchInfo(0,1048576),[cafe_waiters_hire,18] ->
> > > PartitionFetchInfo(0,1048576),[click_mail,7] ->
> > > PartitionFetchInfo(0,1048576),[mob_gift,14] ->
> > > PartitionFetchInfo(0,1048576),[password_change,3] ->
> > > PartitionFetchInfo(0,1048576),[mm_msg,8] ->
> > > PartitionFetchInfo(0,1048576),[tagline,1] ->
> > > PartitionFetchInfo(0,1048576),[meetme_oops,21] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashflow,18] ->
> > > PartitionFetchInfo(0,1048576),[newsfeed_likes,22] ->
> > > PartitionFetchInfo(0,1048576),[video_shares,33] ->
> > > PartitionFetchInfo(0,1048576),[pets_classifier_test,28] ->
> > > PartitionFetchInfo(0,1048576),[itunes_notification,33] ->
> > > PartitionFetchInfo(0,1048576),[trialpay_notification,24] ->
> > > PartitionFetchInfo(0,1048576),[viral_import,28] ->
> > > PartitionFetchInfo(0,1048576),[chat_guest,4] ->
> > > PartitionFetchInfo(0,1048576),[registration_mobile,35] ->
> > > PartitionFetchInfo(0,1048576),[sponsorpay_notification,20] ->
> > > PartitionFetchInfo(0,1048576),[tagline,16] ->
> > > PartitionFetchInfo(0,1048576),[auto_cancel,15] ->
> > > PartitionFetchInfo(0,1048576),[pets_level,0] ->
> > > PartitionFetchInfo(0,1048576),[security_trigger,20] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashruns_bid,7] ->
> > > PartitionFetchInfo(0,1048576),[statictiming,29] ->
> > > PartitionFetchInfo(0,1048576),[meetme_auto_message_send,21] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashruns_pageview,29] ->
> > > PartitionFetchInfo(0,1048576),[newsfeed_likes,13] ->
> > > PartitionFetchInfo(0,1048576),[page_view_admin,24] ->
> > > PartitionFetchInfo(0,1048576),[cafe_waiters_hire,30] ->
> > > PartitionFetchInfo(0,1048576),[matches_bar_view,35] ->
> > > PartitionFetchInfo(0,1048576),[pets_pageview,31] ->
> > > PartitionFetchInfo(0,1048576),[external_url,15] ->
> > > PartitionFetchInfo(0,1048576),[newsfeed_posts,34] ->
> > > PartitionFetchInfo(0,1048576),[viral_import,13] ->
> > > PartitionFetchInfo(0,1048576),[pets_classifier_bp,22] ->
> > > PartitionFetchInfo(0,1048576),[virtual_gift,6] ->
> > > PartitionFetchInfo(0,1048576),[rapleaf_image,17] ->
> > > PartitionFetchInfo(0,1048576),[cafe_goals,28] ->
> > > PartitionFetchInfo(0,1048576),[language_reg,35] ->
> > > PartitionFetchInfo(0,1048576),[pets_nec_view,12] ->
> > > PartitionFetchInfo(0,1048576),[pets_economy_change,3] ->
> > > PartitionFetchInfo(0,1048576),[pets_purchased,26] ->
> > > PartitionFetchInfo(0,1048576),[mob_properties,10] ->
> > > PartitionFetchInfo(0,1048576),[mob_gift,26] ->
> > > PartitionFetchInfo(0,1048576),[registration_email_domain_scan,13] ->
> > > PartitionFetchInfo(0,1048576),[viral_import_pv,22] ->
> > > PartitionFetchInfo(0,1048576),[photo_view,28] ->
> > > PartitionFetchInfo(0,1048576),[videos,10] ->
> > > PartitionFetchInfo(0,1048576),[cafe_coin_purchases,15] ->
> > > PartitionFetchInfo(0,1048576),[phoneflow,19] ->
> > > PartitionFetchInfo(0,1048576),[cafe_debug,19] ->
> > > PartitionFetchInfo(0,1048576),[pets_level,3] ->
> > > PartitionFetchInfo(0,1048576),[mob_gift,23] ->
> > > PartitionFetchInfo(0,1048576),[pets_reset_achievement,27] ->
> > > PartitionFetchInfo(0,1048576),[m3_profile_views,2] ->
> > > PartitionFetchInfo(0,1048576),[bouncemail,5] ->
> > > PartitionFetchInfo(0,1048576),[mob_properties,4] ->
> > > PartitionFetchInfo(0,1048576),[content_review_preprocess,28] ->
> > > PartitionFetchInfo(0,1048576),[pysm_click,3] ->
> > > PartitionFetchInfo(0,1048576),[m3_auth,23] ->
> > > PartitionFetchInfo(0,1048576),[friend_request,2] ->
> > > PartitionFetchInfo(0,1048576),[photoviewer_message_click,4] ->
> > > PartitionFetchInfo(0,1048576),[push_notifications,19] ->
> > > PartitionFetchInfo(0,1048576),[mob_spend_ap,8] ->
> > > PartitionFetchInfo(0,1048576),[purchase_actions,3] ->
> > > PartitionFetchInfo(0,1048576),[promotions_targeted_actions,13] ->
> > > PartitionFetchInfo(0,1048576),[sponsorpay_notification,23] ->
> > > PartitionFetchInfo(0,1048576),[cafe_unspoils,8] ->
> > > PartitionFetchInfo(0,1048576),[meetme_spotlight_action,3] ->
> > > PartitionFetchInfo(0,1048576),[viral_registration,7] ->
> > > PartitionFetchInfo(0,1048576),[complains,20] ->
> > > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,18] ->
> > > PartitionFetchInfo(0,1048576),[sponsorpay_notification,5] ->
> > > PartitionFetchInfo(0,1048576),[pets_nec_view,0] ->
> > > PartitionFetchInfo(0,1048576),[chat_guest,16] ->
> > > PartitionFetchInfo(0,1048576),[promo_toast_view,1] ->
> > > PartitionFetchInfo(0,1048576),[cancel,25] ->
> > > PartitionFetchInfo(0,1048576),[jstiming,8] ->
> > > PartitionFetchInfo(0,1048576),[pets_wish,12] ->
> > > PartitionFetchInfo(0,1048576),[rapleaf_image,5] ->
> > > PartitionFetchInfo(0,1048576),[language_reg,29] ->
> > > PartitionFetchInfo(0,1048576),[long_query,28] ->
> > > PartitionFetchInfo(0,1048576),[pets_cashflow,6] ->
> > > PartitionFetchInfo(0,1048576),[mobile_click,28] ->
> > > PartitionFetchInfo(0,1048576),[mob_help,18] ->
> > > PartitionFetchInfo(0,1048576),[talk_recommend_click,22] ->
> > > PartitionFetchInfo(0,1048576),[set_primary_photo,16] ->
> > > PartitionFetchInfo(0,1048576),[pets_browse,3] -> Part
> > > java.net.SocketTimeoutException
> > >         at
> > > sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:201)
> > >         at
> sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86)
> > >         at
> > >
> >
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:221)
> > >         at kafka.utils.Utils$.read(Utils.scala:394)
> > >         at
> > >
> > >
> >
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
> > >         at
> > > kafka.network.Receive$class.readCompletely(Transmission.scala:56)
> > >         at
> > >
> > >
> >
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
> > >         at
> > kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
> > >         at
> > > kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:81)
> > >         at
> > >
> > >
> >
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
> > >         at
> > >
> > >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110)
> > >         at
> > >
> > >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
> > >         at
> > >
> > >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
> > >         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> > >         at
> > >
> > >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:109)
> > >         at
> > >
> > >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
> > >         at
> > >
> > >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
> > >         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> > >         at
> kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:108)
> > >         at
> > >
> > >
> >
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96)
> > >         at
> > >
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
> > >         at
> > kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
> > >
> > >
> > > [2013-08-26 12:38:18,854] INFO [ReplicaFetcherThread--1-6], Stopped
> > >  (kafka.server.ReplicaFetcherThread)
> > > [2013-08-26 12:38:18,854] INFO [ReplicaFetcherThread--1-6], Shutdown
> > > completed (kafka.server.ReplicaFetcherThread)
> > > [2013-08-26 12:38:18,858] INFO [ReplicaFetcherThread-0-6], Shutting
> down
> > > (kafka.server.ReplicaFetcherThread)
> > > [2013-08-26 12:38:23,821] INFO Closing socket connection to /
> 10.15.11.24
> > .
> > > (kafka.network.Processor)
> > > [2013-08-26 12:38:23,899] INFO [ReplicaFetcherThread-0-6], Stopped
> > >  (kafka.server.ReplicaFetcherThread)
> > > [2013-08-26 12:38:23,899] INFO [ReplicaFetcherThread-0-6], Shutdown
> > > completed (kafka.server.ReplicaFetcherThread)
> > > [2013-08-26 12:38:23,899] INFO [ReplicaFetcherThread-2-6], Shutting
> down
> > > (kafka.server.ReplicaFetcherThread)
> > > [2013-08-26 12:38:24,064] INFO [ReplicaFetcherThread-2-6], Stopped
> > >  (kafka.server.ReplicaFetcherThread)
> > > [2013-08-26 12:38:24,064] INFO [ReplicaFetcherThread-2-6], Shutdown
> > > completed (kafka.server.ReplicaFetcherThread)
> > >
> > >
> > >
> > > On Tue, Aug 27, 2013 at 10:05 AM, Neha Narkhede <
> neha.narkhede@gmail.com
> > > >wrote:
> > >
> > > > When you said you tried to shutdown the broker, did you try
> controlled
> > > > shutdown? Do you see "Shutting down" in the logs of the broker that
> > > seemed
> > > > to hang?
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > >
> > > > On Tue, Aug 27, 2013 at 9:12 AM, Vadim Keylis <vkeylis2009@gmail.com
> >
> > > > wrote:
> > > >
> > > > > No. They actually were stuck. Not responding to shutdown request. I
> > had
> > > > to
> > > > > kill them with kill -9 command. I try to take heap dump which hang
> as
> > > > well.
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > > On Aug 27, 2013, at 8:14 AM, Jun Rao <ju...@gmail.com> wrote:
> > > > >
> > > > > > The errors you listed may not be serious, as long as they are
> > > > transient.
> > > > > > When you say 2 of the brokers are not responsive, are they
> issuing
> > > > fetch
> > > > > > requests to the 3rd broker (look at the request log)? During a
> > > restart
> > > > of
> > > > > > the whole cluster, brokers that are started later may not have
> any
> > > > leader
> > > > > > and thus won't take any request from the client. You will need to
> > run
> > > > the
> > > > > > leader balance tool.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Mon, Aug 26, 2013 at 10:12 PM, Vadim Keylis <
> > > vkeylis2009@gmail.com
> > > > > >wrote:
> > > > > >
> > > > > >> Somehow I am getting my instances of kafka to crash. I started
> > kafka
> > > > > >> instances one by one and they started successfully. Later it
> some
> > > how
> > > > > two
> > > > > >> of 3 instances became completely unresponsive. The process is
> > > running,
> > > > > but
> > > > > >> connnection over jmx or taking heat dump not possible. The last
> > one
> > > > some
> > > > > >> what resposnive.
> > > > > >> I am not sure how server get to this state. Is there anything I
> > can
> > > > > monitor
> > > > > >> to predict instances about to crash. What are ways to recover
> > > without
> > > > > data
> > > > > >> loss? What am I doing wrong to get to this state. Please advise.
> > > > > >> I poke around error logs on hosts that are not responsive and
> here
> > > are
> > > > > the
> > > > > >> errors I found. One that I have not listed
> > LeaderNotFoundExceotion.
> > > > > >>
> > > > > >> The most puzzling is about zookeeper as it was not redeployed or
> > > > > updated.
> > > > > >> [2013-08-26 12:14:35,357] ERROR [KafkaApi-5] Error while
> fetching
> > > > > metadata
> > > > > >> for partition [self_reactivation,0] (kafka.server.KafkaApis)
> > > > > >> kafka.common.ReplicaNotAvailableException
> > > > > >>        at
> > > > > >>
> > > > >
> > >
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:471)
> > > > > >>        at
> > > > > >>
> > > > >
> > >
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:456)
> > > > > >>        at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
> > > > > >>        at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
> > > > > >>        at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
> > > > > >>        at scala.collection.immutable.List.foreach(List.scala:76)
> > > > > >>        at
> > > > > >>
> > > scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
> > > > > >>
> > > > > >>
> > > > > >> in server.log
> > > > > >> [2013-08-26 21:00:51,942] ERROR Conditional update of path
> > > > > >> /brokers/topics/meetme/partitions/12/state with data {
> > > > > >> "controller_epoch":6, "isr":[ 5 ], "leader":5, "leader_epoch":1,
> > > > > >> "version":1 } and expected version 2 failed due to
> > > > > >> org.apache.zookeeper.KeeperException$BadVersionException:
> > > > > KeeperErrorCode =
> > > > > >> BadVersion for /brokers/topics/meetme/partitions/12/state
> > > > > >> (kafka.utils.ZkUtils$)
> > > > > >> [2013-08-26 21:00:51,943] INFO Partition [meetme,12] on broker
> 5:
> > > > Cached
> > > > > >> zkVersion [2] not equal to that in zookeeper, skip updating ISR
> > > > > >> (kafka.cluster.Partition)
> > > > > >> [2013-08-26 21:00:51,990] INFO Partition [meetme,4] on broker 5:
> > > > > Shrinking
> > > > > >> ISR for partition [meetme,4] from 5,4 to 5
> > (kafka.cluster.Partition)
> > > > > >> [2013-08-26 21:00:51,993] ERROR Conditional update of path
> > > > > >> /brokers/topics/meetme/partitions/4/state with data {
> > > > > "controller_epoch":6,
> > > > > >> "isr":[ 5 ], "leader":5, "leader_epoch":1, "version":1 } and
> > > expected
> > > > > >> version 2 failed due to
> > > > > >> org.apache.zookeeper.KeeperException$BadVersionException:
> > > > > KeeperErrorCode =
> > > > > >> BadVersion for /brokers/topics/meetme/partitions/4/state
> > > > > >> (kafka.utils.ZkUtils$)
> > > > > >> [2013-08-26 21:00:51,993] INFO Partition [meetme,4] on broker 5:
> > > > Cached
> > > > > >> zkVersion [2] not equal to that in zookeeper, skip updating ISR
> > > > > >> (kafka.cluster.Partition)
> > > > > >> [2013-08-26 21:00:52,103] INFO Partition [meetme,6] on broker 5:
> > > > > Shrinking
> > > > > >> ISR for partition [meetme,6] from 5,4 to 5
> > (kafka.cluster.Partition)
> > > > > >> [2013-08-26 21:00:52,107] ERROR Conditional update of path
> > > > > >> /brokers/topics/meetme/partitions/6/state with data {
> > > > > "controller_epoch":6,
> > > > > >> "isr":[ 5 ], "leader":5, "leader_epoch":2, "version":1 } and
> > > expected
> > > > > >> version 3 failed due to
> > > > > >> org.apache.zookeeper.KeeperException$BadVersionException:
> > > > > KeeperErrorCode =
> > > > > >> BadVersion for /brokers/topics/meetme/partitions/6/state
> > > > > >> (kafka.utils.ZkUtils$)
> > > > > >> [2013-08-26 21:00:52,107] INFO Partition [meetme,6] on broker 5:
> > > > Cached
> > > > > >> zkVersion [3] not equal to that in zookeeper, skip updating ISR
> > > > > >> (kafka.cluster.Partition)
> > > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Instances became unresponsive

Posted by Vadim Keylis <vk...@gmail.com>.
How to prevent or proper configure since any broker can become unresponsive?

Thanks,
Vadim


On Tue, Aug 27, 2013 at 11:50 PM, Vadim Keylis <vk...@gmail.com>wrote:

> Hello Jun. Unfortunately I do not have logs from broker 6 to find out
> reasons for it to be unresponsive, but yes it was not healthy. I found it
> to be unresponsive as well.
> How can I recover from all this failures with minimum data loss?
>
>
>
>
> On Tue, Aug 27, 2013 at 8:51 PM, Jun Rao <ju...@gmail.com> wrote:
>
>> It seems the replica fetch thread died because of socket timeout (defaults
>> to 30 secs). Was broker 6 healthy at that point?
>>
>> Thanks,
>>
>> Jun
>>
>>
>> On Tue, Aug 27, 2013 at 11:36 AM, Vadim Keylis <vkeylis2009@gmail.com
>> >wrote:
>>
>> > We do not use controlled shutdown through JMX, its configured in the
>> > property file. I do not see control shutdown message at the time I
>> > initiated the shutdown. However searching for the string produced the
>> > following error messages which happened hours before I started shutting
>> > down service.
>> >
>> > [2013-08-26 12:38:18,850] WARN [ReplicaFetcherThread--1-6], Error in
>> fetch
>> > Name: FetchRequest; Version: 0; CorrelationId: 1541; ClientId:
>> > ReplicaFetcherThread--1-6; ReplicaId: 5; MaxWait: 500 ms; MinBytes: 1
>> > bytes; RequestInfo: [pets_pageview,25] ->
>> > PartitionFetchInfo(0,1048576),[mm_msg,26] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashruns_spin,22] ->
>> > PartitionFetchInfo(0,1048576),[page_timings,5] ->
>> > PartitionFetchInfo(0,1048576),[cafe_purchases,13] ->
>> > PartitionFetchInfo(0,1048576),[meetme_spotlight_action,33] ->
>> > PartitionFetchInfo(0,1048576),[mob_gift,5] ->
>> > PartitionFetchInfo(0,1048576),[cafe_coin_purchases,18] ->
>> > PartitionFetchInfo(0,1048576),[security_trigger,11] ->
>> > PartitionFetchInfo(0,1048576),[pysm_click,30] ->
>> > PartitionFetchInfo(0,1048576),[gold_blacklist,21] ->
>> > PartitionFetchInfo(0,1048576),[meetme_oops,9] ->
>> > PartitionFetchInfo(0,1048576),[m3_session_info,15] ->
>> > PartitionFetchInfo(0,1048576),[link_review,10] ->
>> > PartitionFetchInfo(0,1048576),[cafe_debug,28] ->
>> > PartitionFetchInfo(0,1048576),[m3_login_button,0] ->
>> > PartitionFetchInfo(0,1048576),[pets_level,24] ->
>> > PartitionFetchInfo(0,1048576),[login_detail,9] ->
>> > PartitionFetchInfo(0,1048576),[click_mail,31] ->
>> > PartitionFetchInfo(0,1048576),[pets_wish,15] ->
>> > PartitionFetchInfo(0,1048576),[page_view_admin,6] ->
>> > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,9] ->
>> > PartitionFetchInfo(0,1048576),[pets_wish,24] ->
>> > PartitionFetchInfo(0,1048576),[cafe_food_spoiled,23] ->
>> > PartitionFetchInfo(0,1048576),[pets_wish,33] ->
>> > PartitionFetchInfo(0,1048576),[account_notifications,11] ->
>> > PartitionFetchInfo(0,1048576),[google_transactions,7] ->
>> > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,3] ->
>> > PartitionFetchInfo(0,1048576),[pets_economy_change,15] ->
>> > PartitionFetchInfo(0,1048576),[payment,13] ->
>> > PartitionFetchInfo(0,1048576),[validation,0] ->
>> > PartitionFetchInfo(0,1048576),[meetme_new_contact_count,5] ->
>> > PartitionFetchInfo(0,1048576),[mail_send,18] ->
>> > PartitionFetchInfo(0,1048576),[lightbox_click,28] ->
>> > PartitionFetchInfo(0,1048576),[rso_scanner_append,5] ->
>> > PartitionFetchInfo(0,1048576),[mob_gift,2] ->
>> > PartitionFetchInfo(0,1048576),[cafe_waiter_tips,24] ->
>> > PartitionFetchInfo(0,1048576),[groups_user_actions,22] ->
>> > PartitionFetchInfo(0,1048576),[jstiming,23] ->
>> > PartitionFetchInfo(0,1048576),[viral_contact_inviters,26] ->
>> > PartitionFetchInfo(0,1048576),[meetme_matches_tab,24] ->
>> > PartitionFetchInfo(0,1048576),[promotions_targeted_actions,19] ->
>> > PartitionFetchInfo(0,1048576),[reg_check,15] ->
>> > PartitionFetchInfo(0,1048576),[cafe_waiters_quit,9] ->
>> > PartitionFetchInfo(0,1048576),[groups_search,28] ->
>> > PartitionFetchInfo(0,1048576),[meetme_auto_message_send,15] ->
>> > PartitionFetchInfo(0,1048576),[virtual_gift,21] ->
>> > PartitionFetchInfo(0,1048576),[groupchat_room,21] ->
>> > PartitionFetchInfo(0,1048576),[rapleaf_inter,10] ->
>> > PartitionFetchInfo(0,1048576),[reg_check,18] ->
>> > PartitionFetchInfo(0,1048576),[viral_contact_inviters,29] ->
>> > PartitionFetchInfo(0,1048576),[mail_send,21] ->
>> > PartitionFetchInfo(0,1048576),[pets_economy_change,0] ->
>> > PartitionFetchInfo(0,1048576),[pets_classifier_test,19] ->
>> > PartitionFetchInfo(0,1048576),[cafe_goals,10] ->
>> > PartitionFetchInfo(0,1048576),[pets_browse,9] ->
>> > PartitionFetchInfo(0,1048576),[pets_achievements,24] ->
>> > PartitionFetchInfo(0,1048576),[content_review_preprocess,31] ->
>> > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,12] ->
>> > PartitionFetchInfo(0,1048576),[link_render,16] ->
>> > PartitionFetchInfo(0,1048576),[ent_wt_winners,2] ->
>> > PartitionFetchInfo(0,1048576),[meetme,23] ->
>> > PartitionFetchInfo(1178,1048576),[gwallet_notification,30] ->
>> > PartitionFetchInfo(0,1048576),[m3_login_button,18] ->
>> > PartitionFetchInfo(0,1048576),[mobile_logout,3] ->
>> > PartitionFetchInfo(0,1048576),[trialpay_notification,30] ->
>> > PartitionFetchInfo(0,1048576),[sponsorpay_notification,35] ->
>> > PartitionFetchInfo(0,1048576),[groupchat_room,6] ->
>> > PartitionFetchInfo(0,1048576),[pets_reload,30] ->
>> > PartitionFetchInfo(0,1048576),[report_source,1] ->
>> > PartitionFetchInfo(0,1048576),[pets_nec_buygold,21] ->
>> > PartitionFetchInfo(0,1048576),[set_primary_photo,13] ->
>> > PartitionFetchInfo(0,1048576),[ent_wt_lb_sort,2] ->
>> > PartitionFetchInfo(0,1048576),[video_shares,24] ->
>> > PartitionFetchInfo(0,1048576),[viral_contact_importers,13] ->
>> > PartitionFetchInfo(0,1048576),[click_mail,13] ->
>> > PartitionFetchInfo(0,1048576),[pets_achievements,6] ->
>> > PartitionFetchInfo(0,1048576),[newsfeed_likes,10] ->
>> > PartitionFetchInfo(0,1048576),[pets_achievements,12] ->
>> > PartitionFetchInfo(0,1048576),[registration_email_domain_scan,31] ->
>> > PartitionFetchInfo(0,1048576),[photoviewer_message_click,19] ->
>> > PartitionFetchInfo(0,1048576),[registration_email_domain_scan,1] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashruns_conf,6] ->
>> > PartitionFetchInfo(0,1048576),[meetme_headliners_purchase,28] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashflow,33] ->
>> > PartitionFetchInfo(0,1048576),[top8_feat_users,6] ->
>> > PartitionFetchInfo(0,1048576),[http_headers,29] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashruns_purchase,24] ->
>> > PartitionFetchInfo(0,1048576),[payment,4] ->
>> > PartitionFetchInfo(0,1048576),[talk_recommend_click,13] ->
>> > PartitionFetchInfo(0,1048576),[mob_help,3] ->
>> > PartitionFetchInfo(0,1048576),[auto_cancel,6] ->
>> > PartitionFetchInfo(0,1048576),[bouncemail,32] ->
>> > PartitionFetchInfo(0,1048576),[user_support,23] ->
>> > PartitionFetchInfo(0,1048576),[groups_search,34] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashruns_spin,16] ->
>> > PartitionFetchInfo(0,1048576),[trialpay_notification,21] ->
>> > PartitionFetchInfo(0,1048576),[lightbox_view,6] ->
>> > PartitionFetchInfo(0,1048576),[gwallet_notification,21] ->
>> > PartitionFetchInfo(0,1048576),[cafe_unspoils,20] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashflow,9] ->
>> > PartitionFetchInfo(0,1048576),[promotions_targeted_actions,34] ->
>> > PartitionFetchInfo(0,1048576),[chat_login,4] ->
>> > PartitionFetchInfo(0,1048576),[google_transactions,28] ->
>> > PartitionFetchInfo(0,1048576),[meetme_new_contact_count,11] ->
>> > PartitionFetchInfo(0,1048576),[meetme_spotlight_purchase,28] ->
>> > PartitionFetchInfo(0,1048576),[validation_email_change,24] ->
>> > PartitionFetchInfo(0,1048576),[meetme_oops,30] ->
>> > PartitionFetchInfo(0,1048576),[pets_reset_achievement,18] ->
>> > PartitionFetchInfo(0,1048576),[hi5_blast_photocount,32] ->
>> > PartitionFetchInfo(0,1048576),[security_trigger,5] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashruns_winners,32] ->
>> > PartitionFetchInfo(0,1048576),[validation,24] ->
>> > PartitionFetchInfo(0,1048576),[lightbox_click,31] ->
>> > PartitionFetchInfo(0,1048576),[viral_landing_page,7] ->
>> > PartitionFetchInfo(0,1048576),[mobile_performance,26] ->
>> > PartitionFetchInfo(0,1048576),[viral_reg_pv,28] ->
>> > PartitionFetchInfo(0,1048576),[statictiming,8] ->
>> > PartitionFetchInfo(0,1048576),[hi5_blast_photocount,20] ->
>> > PartitionFetchInfo(0,1048576),[mobile_logout,12] ->
>> > PartitionFetchInfo(0,1048576),[validation_lightbox_action,5] ->
>> > PartitionFetchInfo(0,1048576),[cafe_food_served,10] ->
>> > PartitionFetchInfo(0,1048576),[content_checkboxes,4] ->
>> > PartitionFetchInfo(0,1048576),[meetme_matches_tab,27] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashruns_purchase,27] ->
>> > PartitionFetchInfo(0,1048576),[auto_cancel,27] ->
>> > PartitionFetchInfo(0,1048576),[pets_reset_achievement,9] ->
>> > PartitionFetchInfo(0,1048576),[m3_login_button,9] ->
>> > PartitionFetchInfo(0,1048576),[chat_login,16] ->
>> > PartitionFetchInfo(0,1048576),[rapleaf_inter,13] ->
>> > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,6] ->
>> > PartitionFetchInfo(0,1048576),[link_review,1] ->
>> > PartitionFetchInfo(0,1048576),[itunes_notification,6] ->
>> > PartitionFetchInfo(0,1048576),[m3_session_info,24] ->
>> > PartitionFetchInfo(0,1048576),[account_notifications,23] ->
>> > PartitionFetchInfo(0,1048576),[photoviewer_message_click,16] ->
>> > PartitionFetchInfo(0,1048576),[pets_economy_change,18] ->
>> > PartitionFetchInfo(0,1048576),[cafe_coin_purchases,9] ->
>> > PartitionFetchInfo(0,1048576),[virtual_gift,3] ->
>> > PartitionFetchInfo(0,1048576),[complains,8] ->
>> > PartitionFetchInfo(0,1048576),[pets_purchased,20] ->
>> > PartitionFetchInfo(0,1048576),[cafe_waiter_tips,18] ->
>> > PartitionFetchInfo(0,1048576),[rso_scanner_append,23] ->
>> > PartitionFetchInfo(0,1048576),[viral_contact_importers,4] ->
>> > PartitionFetchInfo(0,1048576),[viral_contact_inviters,17] ->
>> > PartitionFetchInfo(0,1048576),[meetme_matches_tab,21] ->
>> > PartitionFetchInfo(0,1048576),[viral_invite,31] ->
>> > PartitionFetchInfo(0,1048576),[page_view_api,13] ->
>> > PartitionFetchInfo(1143,1048576),[mm_msg,17] ->
>> > PartitionFetchInfo(0,1048576),[cancel,22] ->
>> > PartitionFetchInfo(0,1048576),[mobile_performance,20] ->
>> > PartitionFetchInfo(0,1048576),[set_primary_photo,22] ->
>> > PartitionFetchInfo(0,1048576),[registration_email_domain_scan,10] ->
>> > PartitionFetchInfo(0,1048576),[mob_properties,28] ->
>> > PartitionFetchInfo(0,1048576),[viral_registration,22] ->
>> > PartitionFetchInfo(0,1048576),[user_contact,9] ->
>> > PartitionFetchInfo(0,1048576),[rapleaf_image,2] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashruns_spin,25] ->
>> > PartitionFetchInfo(0,1048576),[cafe_debug,7] ->
>> > PartitionFetchInfo(0,1048576),[mobile_app_activities,8] ->
>> > PartitionFetchInfo(0,1048576),[cafe_food_spoiled,20] ->
>> > PartitionFetchInfo(0,1048576),[validation,9] ->
>> > PartitionFetchInfo(0,1048576),[pets_achievements,15] ->
>> > PartitionFetchInfo(0,1048576),[gwallet_notification,3] ->
>> > PartitionFetchInfo(0,1048576),[index_pageview,33] ->
>> > PartitionFetchInfo(0,1048576),[security_trigger,23] ->
>> > PartitionFetchInfo(0,1048576),[link_render,22] ->
>> > PartitionFetchInfo(0,1048576),[lightbox_click,19] ->
>> > PartitionFetchInfo(0,1048576),[cafe_waiter_tips,30] ->
>> > PartitionFetchInfo(0,1048576),[questions,13] ->
>> > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,15] ->
>> > PartitionFetchInfo(0,1048576),[phoneflow,31] ->
>> > PartitionFetchInfo(0,1048576),[pets_buyback_lightbox,18] ->
>> > PartitionFetchInfo(0,1048576),[mob_premium,6] ->
>> > PartitionFetchInfo(0,1048576),[cafe_unspoils,5] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashruns_spin,34] ->
>> > PartitionFetchInfo(0,1048576),[viral_reg_pv,16] ->
>> > PartitionFetchInfo(0,1048576),[http_headers,23] ->
>> > PartitionFetchInfo(0,1048576),[viral_landing_page,10] ->
>> > PartitionFetchInfo(0,1048576),[mob_properties,13] ->
>> > PartitionFetchInfo(0,1048576),[mobile_api,13] ->
>> > PartitionFetchInfo(0,1048576),[pets_purchased,32] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashruns_bid,22] ->
>> > PartitionFetchInfo(0,1048576),[photo_view,22] ->
>> > PartitionFetchInfo(0,1048576),[viral_import_conf_pv,18] ->
>> > PartitionFetchInfo(0,1048576),[mob_viral_job,28] ->
>> > PartitionFetchInfo(0,1048576),[newsfeed_filter_change,22] ->
>> > PartitionFetchInfo(0,1048576),[password_change,12] ->
>> > PartitionFetchInfo(0,1048576),[ent_wt_winners,23] ->
>> > PartitionFetchInfo(0,1048576),[m3_account_cancel,15] ->
>> > PartitionFetchInfo(0,1048576),[meetme_auto_message_send,6] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashflow,27] ->
>> > PartitionFetchInfo(0,1048576),[rapleaf_inter,31] ->
>> > PartitionFetchInfo(0,1048576),[user_contact,33] ->
>> > PartitionFetchInfo(0,1048576),[validation_email_change,0] ->
>> > PartitionFetchInfo(0,1048576),[pets_classifier_pysb,18] ->
>> > PartitionFetchInfo(0,1048576),[http_headers,14] ->
>> > PartitionFetchInfo(0,1048576),[meetme_matches_tab,30] ->
>> > PartitionFetchInfo(0,1048576),[m3_user_block,23] ->
>> > PartitionFetchInfo(0,1048576),[mob_premium,9] ->
>> > PartitionFetchInfo(0,1048576),[page_view_api,19] ->
>> > PartitionFetchInfo(1125,1048576),[photoviewer_like,15] ->
>> > PartitionFetchInfo(0,1048576),[meetme_oops,27] ->
>> > PartitionFetchInfo(0,1048576),[photoviewer_message_click,1] ->
>> > PartitionFetchInfo(0,1048576),[page_view_admin,27] ->
>> > PartitionFetchInfo(0,1048576),[cafe_waiters_quit,6] ->
>> > PartitionFetchInfo(0,1048576),[mob_premium,15] ->
>> > PartitionFetchInfo(0,1048576),[itunes_notification,15] ->
>> > PartitionFetchInfo(0,1048576),[phoneflow,22] ->
>> > PartitionFetchInfo(0,1048576),[auto_cancel,30] ->
>> > PartitionFetchInfo(0,1048576),[skins_library,33] ->
>> > PartitionFetchInfo(0,1048576),[hi5_blast_photocount,2] ->
>> > PartitionFetchInfo(0,1048576),[security_trigger,8] ->
>> > PartitionFetchInfo(0,1048576),[test_assign,0] ->
>> > PartitionFetchInfo(0,1048576),[test_assign,9] ->
>> > PartitionFetchInfo(0,1048576),[cafe_badge,3] ->
>> > PartitionFetchInfo(0,1048576),[mob_properties,31] ->
>> > PartitionFetchInfo(0,1048576),[pets_classifier_bp,7] ->
>> > PartitionFetchInfo(0,1048576),[chat_login,13] ->
>> > PartitionFetchInfo(0,1048576),[cafe_waiters_hire,18] ->
>> > PartitionFetchInfo(0,1048576),[click_mail,7] ->
>> > PartitionFetchInfo(0,1048576),[mob_gift,14] ->
>> > PartitionFetchInfo(0,1048576),[password_change,3] ->
>> > PartitionFetchInfo(0,1048576),[mm_msg,8] ->
>> > PartitionFetchInfo(0,1048576),[tagline,1] ->
>> > PartitionFetchInfo(0,1048576),[meetme_oops,21] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashflow,18] ->
>> > PartitionFetchInfo(0,1048576),[newsfeed_likes,22] ->
>> > PartitionFetchInfo(0,1048576),[video_shares,33] ->
>> > PartitionFetchInfo(0,1048576),[pets_classifier_test,28] ->
>> > PartitionFetchInfo(0,1048576),[itunes_notification,33] ->
>> > PartitionFetchInfo(0,1048576),[trialpay_notification,24] ->
>> > PartitionFetchInfo(0,1048576),[viral_import,28] ->
>> > PartitionFetchInfo(0,1048576),[chat_guest,4] ->
>> > PartitionFetchInfo(0,1048576),[registration_mobile,35] ->
>> > PartitionFetchInfo(0,1048576),[sponsorpay_notification,20] ->
>> > PartitionFetchInfo(0,1048576),[tagline,16] ->
>> > PartitionFetchInfo(0,1048576),[auto_cancel,15] ->
>> > PartitionFetchInfo(0,1048576),[pets_level,0] ->
>> > PartitionFetchInfo(0,1048576),[security_trigger,20] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashruns_bid,7] ->
>> > PartitionFetchInfo(0,1048576),[statictiming,29] ->
>> > PartitionFetchInfo(0,1048576),[meetme_auto_message_send,21] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashruns_pageview,29] ->
>> > PartitionFetchInfo(0,1048576),[newsfeed_likes,13] ->
>> > PartitionFetchInfo(0,1048576),[page_view_admin,24] ->
>> > PartitionFetchInfo(0,1048576),[cafe_waiters_hire,30] ->
>> > PartitionFetchInfo(0,1048576),[matches_bar_view,35] ->
>> > PartitionFetchInfo(0,1048576),[pets_pageview,31] ->
>> > PartitionFetchInfo(0,1048576),[external_url,15] ->
>> > PartitionFetchInfo(0,1048576),[newsfeed_posts,34] ->
>> > PartitionFetchInfo(0,1048576),[viral_import,13] ->
>> > PartitionFetchInfo(0,1048576),[pets_classifier_bp,22] ->
>> > PartitionFetchInfo(0,1048576),[virtual_gift,6] ->
>> > PartitionFetchInfo(0,1048576),[rapleaf_image,17] ->
>> > PartitionFetchInfo(0,1048576),[cafe_goals,28] ->
>> > PartitionFetchInfo(0,1048576),[language_reg,35] ->
>> > PartitionFetchInfo(0,1048576),[pets_nec_view,12] ->
>> > PartitionFetchInfo(0,1048576),[pets_economy_change,3] ->
>> > PartitionFetchInfo(0,1048576),[pets_purchased,26] ->
>> > PartitionFetchInfo(0,1048576),[mob_properties,10] ->
>> > PartitionFetchInfo(0,1048576),[mob_gift,26] ->
>> > PartitionFetchInfo(0,1048576),[registration_email_domain_scan,13] ->
>> > PartitionFetchInfo(0,1048576),[viral_import_pv,22] ->
>> > PartitionFetchInfo(0,1048576),[photo_view,28] ->
>> > PartitionFetchInfo(0,1048576),[videos,10] ->
>> > PartitionFetchInfo(0,1048576),[cafe_coin_purchases,15] ->
>> > PartitionFetchInfo(0,1048576),[phoneflow,19] ->
>> > PartitionFetchInfo(0,1048576),[cafe_debug,19] ->
>> > PartitionFetchInfo(0,1048576),[pets_level,3] ->
>> > PartitionFetchInfo(0,1048576),[mob_gift,23] ->
>> > PartitionFetchInfo(0,1048576),[pets_reset_achievement,27] ->
>> > PartitionFetchInfo(0,1048576),[m3_profile_views,2] ->
>> > PartitionFetchInfo(0,1048576),[bouncemail,5] ->
>> > PartitionFetchInfo(0,1048576),[mob_properties,4] ->
>> > PartitionFetchInfo(0,1048576),[content_review_preprocess,28] ->
>> > PartitionFetchInfo(0,1048576),[pysm_click,3] ->
>> > PartitionFetchInfo(0,1048576),[m3_auth,23] ->
>> > PartitionFetchInfo(0,1048576),[friend_request,2] ->
>> > PartitionFetchInfo(0,1048576),[photoviewer_message_click,4] ->
>> > PartitionFetchInfo(0,1048576),[push_notifications,19] ->
>> > PartitionFetchInfo(0,1048576),[mob_spend_ap,8] ->
>> > PartitionFetchInfo(0,1048576),[purchase_actions,3] ->
>> > PartitionFetchInfo(0,1048576),[promotions_targeted_actions,13] ->
>> > PartitionFetchInfo(0,1048576),[sponsorpay_notification,23] ->
>> > PartitionFetchInfo(0,1048576),[cafe_unspoils,8] ->
>> > PartitionFetchInfo(0,1048576),[meetme_spotlight_action,3] ->
>> > PartitionFetchInfo(0,1048576),[viral_registration,7] ->
>> > PartitionFetchInfo(0,1048576),[complains,20] ->
>> > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,18] ->
>> > PartitionFetchInfo(0,1048576),[sponsorpay_notification,5] ->
>> > PartitionFetchInfo(0,1048576),[pets_nec_view,0] ->
>> > PartitionFetchInfo(0,1048576),[chat_guest,16] ->
>> > PartitionFetchInfo(0,1048576),[promo_toast_view,1] ->
>> > PartitionFetchInfo(0,1048576),[cancel,25] ->
>> > PartitionFetchInfo(0,1048576),[jstiming,8] ->
>> > PartitionFetchInfo(0,1048576),[pets_wish,12] ->
>> > PartitionFetchInfo(0,1048576),[rapleaf_image,5] ->
>> > PartitionFetchInfo(0,1048576),[language_reg,29] ->
>> > PartitionFetchInfo(0,1048576),[long_query,28] ->
>> > PartitionFetchInfo(0,1048576),[pets_cashflow,6] ->
>> > PartitionFetchInfo(0,1048576),[mobile_click,28] ->
>> > PartitionFetchInfo(0,1048576),[mob_help,18] ->
>> > PartitionFetchInfo(0,1048576),[talk_recommend_click,22] ->
>> > PartitionFetchInfo(0,1048576),[set_primary_photo,16] ->
>> > PartitionFetchInfo(0,1048576),[pets_browse,3] -> Part
>> > java.net.SocketTimeoutException
>> >         at
>> > sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:201)
>> >         at
>> sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86)
>> >         at
>> >
>> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:221)
>> >         at kafka.utils.Utils$.read(Utils.scala:394)
>> >         at
>> >
>> >
>> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>> >         at
>> > kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>> >         at
>> >
>> >
>> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>> >         at
>> kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
>> >         at
>> > kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:81)
>> >         at
>> >
>> >
>> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
>> >         at
>> >
>> >
>> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110)
>> >         at
>> >
>> >
>> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
>> >         at
>> >
>> >
>> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
>> >         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>> >         at
>> >
>> >
>> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:109)
>> >         at
>> >
>> >
>> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
>> >         at
>> >
>> >
>> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
>> >         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>> >         at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:108)
>> >         at
>> >
>> >
>> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96)
>> >         at
>> >
>> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
>> >         at
>> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
>> >
>> >
>> > [2013-08-26 12:38:18,854] INFO [ReplicaFetcherThread--1-6], Stopped
>> >  (kafka.server.ReplicaFetcherThread)
>> > [2013-08-26 12:38:18,854] INFO [ReplicaFetcherThread--1-6], Shutdown
>> > completed (kafka.server.ReplicaFetcherThread)
>> > [2013-08-26 12:38:18,858] INFO [ReplicaFetcherThread-0-6], Shutting down
>> > (kafka.server.ReplicaFetcherThread)
>> > [2013-08-26 12:38:23,821] INFO Closing socket connection to /
>> 10.15.11.24.
>> > (kafka.network.Processor)
>> > [2013-08-26 12:38:23,899] INFO [ReplicaFetcherThread-0-6], Stopped
>> >  (kafka.server.ReplicaFetcherThread)
>> > [2013-08-26 12:38:23,899] INFO [ReplicaFetcherThread-0-6], Shutdown
>> > completed (kafka.server.ReplicaFetcherThread)
>> > [2013-08-26 12:38:23,899] INFO [ReplicaFetcherThread-2-6], Shutting down
>> > (kafka.server.ReplicaFetcherThread)
>> > [2013-08-26 12:38:24,064] INFO [ReplicaFetcherThread-2-6], Stopped
>> >  (kafka.server.ReplicaFetcherThread)
>> > [2013-08-26 12:38:24,064] INFO [ReplicaFetcherThread-2-6], Shutdown
>> > completed (kafka.server.ReplicaFetcherThread)
>> >
>> >
>> >
>> > On Tue, Aug 27, 2013 at 10:05 AM, Neha Narkhede <
>> neha.narkhede@gmail.com
>> > >wrote:
>> >
>> > > When you said you tried to shutdown the broker, did you try controlled
>> > > shutdown? Do you see "Shutting down" in the logs of the broker that
>> > seemed
>> > > to hang?
>> > >
>> > > Thanks,
>> > > Neha
>> > >
>> > >
>> > > On Tue, Aug 27, 2013 at 9:12 AM, Vadim Keylis <vk...@gmail.com>
>> > > wrote:
>> > >
>> > > > No. They actually were stuck. Not responding to shutdown request. I
>> had
>> > > to
>> > > > kill them with kill -9 command. I try to take heap dump which hang
>> as
>> > > well.
>> > > >
>> > > > Sent from my iPhone
>> > > >
>> > > > On Aug 27, 2013, at 8:14 AM, Jun Rao <ju...@gmail.com> wrote:
>> > > >
>> > > > > The errors you listed may not be serious, as long as they are
>> > > transient.
>> > > > > When you say 2 of the brokers are not responsive, are they issuing
>> > > fetch
>> > > > > requests to the 3rd broker (look at the request log)? During a
>> > restart
>> > > of
>> > > > > the whole cluster, brokers that are started later may not have any
>> > > leader
>> > > > > and thus won't take any request from the client. You will need to
>> run
>> > > the
>> > > > > leader balance tool.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jun
>> > > > >
>> > > > >
>> > > > > On Mon, Aug 26, 2013 at 10:12 PM, Vadim Keylis <
>> > vkeylis2009@gmail.com
>> > > > >wrote:
>> > > > >
>> > > > >> Somehow I am getting my instances of kafka to crash. I started
>> kafka
>> > > > >> instances one by one and they started successfully. Later it some
>> > how
>> > > > two
>> > > > >> of 3 instances became completely unresponsive. The process is
>> > running,
>> > > > but
>> > > > >> connnection over jmx or taking heat dump not possible. The last
>> one
>> > > some
>> > > > >> what resposnive.
>> > > > >> I am not sure how server get to this state. Is there anything I
>> can
>> > > > monitor
>> > > > >> to predict instances about to crash. What are ways to recover
>> > without
>> > > > data
>> > > > >> loss? What am I doing wrong to get to this state. Please advise.
>> > > > >> I poke around error logs on hosts that are not responsive and
>> here
>> > are
>> > > > the
>> > > > >> errors I found. One that I have not listed
>> LeaderNotFoundExceotion.
>> > > > >>
>> > > > >> The most puzzling is about zookeeper as it was not redeployed or
>> > > > updated.
>> > > > >> [2013-08-26 12:14:35,357] ERROR [KafkaApi-5] Error while fetching
>> > > > metadata
>> > > > >> for partition [self_reactivation,0] (kafka.server.KafkaApis)
>> > > > >> kafka.common.ReplicaNotAvailableException
>> > > > >>        at
>> > > > >>
>> > > >
>> >
>> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:471)
>> > > > >>        at
>> > > > >>
>> > > >
>> >
>> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:456)
>> > > > >>        at
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
>> > > > >>        at
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
>> > > > >>        at
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
>> > > > >>        at scala.collection.immutable.List.foreach(List.scala:76)
>> > > > >>        at
>> > > > >>
>> > scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>> > > > >>
>> > > > >>
>> > > > >> in server.log
>> > > > >> [2013-08-26 21:00:51,942] ERROR Conditional update of path
>> > > > >> /brokers/topics/meetme/partitions/12/state with data {
>> > > > >> "controller_epoch":6, "isr":[ 5 ], "leader":5, "leader_epoch":1,
>> > > > >> "version":1 } and expected version 2 failed due to
>> > > > >> org.apache.zookeeper.KeeperException$BadVersionException:
>> > > > KeeperErrorCode =
>> > > > >> BadVersion for /brokers/topics/meetme/partitions/12/state
>> > > > >> (kafka.utils.ZkUtils$)
>> > > > >> [2013-08-26 21:00:51,943] INFO Partition [meetme,12] on broker 5:
>> > > Cached
>> > > > >> zkVersion [2] not equal to that in zookeeper, skip updating ISR
>> > > > >> (kafka.cluster.Partition)
>> > > > >> [2013-08-26 21:00:51,990] INFO Partition [meetme,4] on broker 5:
>> > > > Shrinking
>> > > > >> ISR for partition [meetme,4] from 5,4 to 5
>> (kafka.cluster.Partition)
>> > > > >> [2013-08-26 21:00:51,993] ERROR Conditional update of path
>> > > > >> /brokers/topics/meetme/partitions/4/state with data {
>> > > > "controller_epoch":6,
>> > > > >> "isr":[ 5 ], "leader":5, "leader_epoch":1, "version":1 } and
>> > expected
>> > > > >> version 2 failed due to
>> > > > >> org.apache.zookeeper.KeeperException$BadVersionException:
>> > > > KeeperErrorCode =
>> > > > >> BadVersion for /brokers/topics/meetme/partitions/4/state
>> > > > >> (kafka.utils.ZkUtils$)
>> > > > >> [2013-08-26 21:00:51,993] INFO Partition [meetme,4] on broker 5:
>> > > Cached
>> > > > >> zkVersion [2] not equal to that in zookeeper, skip updating ISR
>> > > > >> (kafka.cluster.Partition)
>> > > > >> [2013-08-26 21:00:52,103] INFO Partition [meetme,6] on broker 5:
>> > > > Shrinking
>> > > > >> ISR for partition [meetme,6] from 5,4 to 5
>> (kafka.cluster.Partition)
>> > > > >> [2013-08-26 21:00:52,107] ERROR Conditional update of path
>> > > > >> /brokers/topics/meetme/partitions/6/state with data {
>> > > > "controller_epoch":6,
>> > > > >> "isr":[ 5 ], "leader":5, "leader_epoch":2, "version":1 } and
>> > expected
>> > > > >> version 3 failed due to
>> > > > >> org.apache.zookeeper.KeeperException$BadVersionException:
>> > > > KeeperErrorCode =
>> > > > >> BadVersion for /brokers/topics/meetme/partitions/6/state
>> > > > >> (kafka.utils.ZkUtils$)
>> > > > >> [2013-08-26 21:00:52,107] INFO Partition [meetme,6] on broker 5:
>> > > Cached
>> > > > >> zkVersion [3] not equal to that in zookeeper, skip updating ISR
>> > > > >> (kafka.cluster.Partition)
>> > > > >>
>> > > >
>> > >
>> >
>>
>
>

Re: Instances became unresponsive

Posted by Vadim Keylis <vk...@gmail.com>.
Hello Jun. Unfortunately I do not have logs from broker 6 to find out
reasons for it to be unresponsive, but yes it was not healthy. I found it
to be unresponsive as well.
How can I recover from all this failures with minimum data loss?




On Tue, Aug 27, 2013 at 8:51 PM, Jun Rao <ju...@gmail.com> wrote:

> It seems the replica fetch thread died because of socket timeout (defaults
> to 30 secs). Was broker 6 healthy at that point?
>
> Thanks,
>
> Jun
>
>
> On Tue, Aug 27, 2013 at 11:36 AM, Vadim Keylis <vkeylis2009@gmail.com
> >wrote:
>
> > We do not use controlled shutdown through JMX, its configured in the
> > property file. I do not see control shutdown message at the time I
> > initiated the shutdown. However searching for the string produced the
> > following error messages which happened hours before I started shutting
> > down service.
> >
> > [2013-08-26 12:38:18,850] WARN [ReplicaFetcherThread--1-6], Error in
> fetch
> > Name: FetchRequest; Version: 0; CorrelationId: 1541; ClientId:
> > ReplicaFetcherThread--1-6; ReplicaId: 5; MaxWait: 500 ms; MinBytes: 1
> > bytes; RequestInfo: [pets_pageview,25] ->
> > PartitionFetchInfo(0,1048576),[mm_msg,26] ->
> > PartitionFetchInfo(0,1048576),[pets_cashruns_spin,22] ->
> > PartitionFetchInfo(0,1048576),[page_timings,5] ->
> > PartitionFetchInfo(0,1048576),[cafe_purchases,13] ->
> > PartitionFetchInfo(0,1048576),[meetme_spotlight_action,33] ->
> > PartitionFetchInfo(0,1048576),[mob_gift,5] ->
> > PartitionFetchInfo(0,1048576),[cafe_coin_purchases,18] ->
> > PartitionFetchInfo(0,1048576),[security_trigger,11] ->
> > PartitionFetchInfo(0,1048576),[pysm_click,30] ->
> > PartitionFetchInfo(0,1048576),[gold_blacklist,21] ->
> > PartitionFetchInfo(0,1048576),[meetme_oops,9] ->
> > PartitionFetchInfo(0,1048576),[m3_session_info,15] ->
> > PartitionFetchInfo(0,1048576),[link_review,10] ->
> > PartitionFetchInfo(0,1048576),[cafe_debug,28] ->
> > PartitionFetchInfo(0,1048576),[m3_login_button,0] ->
> > PartitionFetchInfo(0,1048576),[pets_level,24] ->
> > PartitionFetchInfo(0,1048576),[login_detail,9] ->
> > PartitionFetchInfo(0,1048576),[click_mail,31] ->
> > PartitionFetchInfo(0,1048576),[pets_wish,15] ->
> > PartitionFetchInfo(0,1048576),[page_view_admin,6] ->
> > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,9] ->
> > PartitionFetchInfo(0,1048576),[pets_wish,24] ->
> > PartitionFetchInfo(0,1048576),[cafe_food_spoiled,23] ->
> > PartitionFetchInfo(0,1048576),[pets_wish,33] ->
> > PartitionFetchInfo(0,1048576),[account_notifications,11] ->
> > PartitionFetchInfo(0,1048576),[google_transactions,7] ->
> > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,3] ->
> > PartitionFetchInfo(0,1048576),[pets_economy_change,15] ->
> > PartitionFetchInfo(0,1048576),[payment,13] ->
> > PartitionFetchInfo(0,1048576),[validation,0] ->
> > PartitionFetchInfo(0,1048576),[meetme_new_contact_count,5] ->
> > PartitionFetchInfo(0,1048576),[mail_send,18] ->
> > PartitionFetchInfo(0,1048576),[lightbox_click,28] ->
> > PartitionFetchInfo(0,1048576),[rso_scanner_append,5] ->
> > PartitionFetchInfo(0,1048576),[mob_gift,2] ->
> > PartitionFetchInfo(0,1048576),[cafe_waiter_tips,24] ->
> > PartitionFetchInfo(0,1048576),[groups_user_actions,22] ->
> > PartitionFetchInfo(0,1048576),[jstiming,23] ->
> > PartitionFetchInfo(0,1048576),[viral_contact_inviters,26] ->
> > PartitionFetchInfo(0,1048576),[meetme_matches_tab,24] ->
> > PartitionFetchInfo(0,1048576),[promotions_targeted_actions,19] ->
> > PartitionFetchInfo(0,1048576),[reg_check,15] ->
> > PartitionFetchInfo(0,1048576),[cafe_waiters_quit,9] ->
> > PartitionFetchInfo(0,1048576),[groups_search,28] ->
> > PartitionFetchInfo(0,1048576),[meetme_auto_message_send,15] ->
> > PartitionFetchInfo(0,1048576),[virtual_gift,21] ->
> > PartitionFetchInfo(0,1048576),[groupchat_room,21] ->
> > PartitionFetchInfo(0,1048576),[rapleaf_inter,10] ->
> > PartitionFetchInfo(0,1048576),[reg_check,18] ->
> > PartitionFetchInfo(0,1048576),[viral_contact_inviters,29] ->
> > PartitionFetchInfo(0,1048576),[mail_send,21] ->
> > PartitionFetchInfo(0,1048576),[pets_economy_change,0] ->
> > PartitionFetchInfo(0,1048576),[pets_classifier_test,19] ->
> > PartitionFetchInfo(0,1048576),[cafe_goals,10] ->
> > PartitionFetchInfo(0,1048576),[pets_browse,9] ->
> > PartitionFetchInfo(0,1048576),[pets_achievements,24] ->
> > PartitionFetchInfo(0,1048576),[content_review_preprocess,31] ->
> > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,12] ->
> > PartitionFetchInfo(0,1048576),[link_render,16] ->
> > PartitionFetchInfo(0,1048576),[ent_wt_winners,2] ->
> > PartitionFetchInfo(0,1048576),[meetme,23] ->
> > PartitionFetchInfo(1178,1048576),[gwallet_notification,30] ->
> > PartitionFetchInfo(0,1048576),[m3_login_button,18] ->
> > PartitionFetchInfo(0,1048576),[mobile_logout,3] ->
> > PartitionFetchInfo(0,1048576),[trialpay_notification,30] ->
> > PartitionFetchInfo(0,1048576),[sponsorpay_notification,35] ->
> > PartitionFetchInfo(0,1048576),[groupchat_room,6] ->
> > PartitionFetchInfo(0,1048576),[pets_reload,30] ->
> > PartitionFetchInfo(0,1048576),[report_source,1] ->
> > PartitionFetchInfo(0,1048576),[pets_nec_buygold,21] ->
> > PartitionFetchInfo(0,1048576),[set_primary_photo,13] ->
> > PartitionFetchInfo(0,1048576),[ent_wt_lb_sort,2] ->
> > PartitionFetchInfo(0,1048576),[video_shares,24] ->
> > PartitionFetchInfo(0,1048576),[viral_contact_importers,13] ->
> > PartitionFetchInfo(0,1048576),[click_mail,13] ->
> > PartitionFetchInfo(0,1048576),[pets_achievements,6] ->
> > PartitionFetchInfo(0,1048576),[newsfeed_likes,10] ->
> > PartitionFetchInfo(0,1048576),[pets_achievements,12] ->
> > PartitionFetchInfo(0,1048576),[registration_email_domain_scan,31] ->
> > PartitionFetchInfo(0,1048576),[photoviewer_message_click,19] ->
> > PartitionFetchInfo(0,1048576),[registration_email_domain_scan,1] ->
> > PartitionFetchInfo(0,1048576),[pets_cashruns_conf,6] ->
> > PartitionFetchInfo(0,1048576),[meetme_headliners_purchase,28] ->
> > PartitionFetchInfo(0,1048576),[pets_cashflow,33] ->
> > PartitionFetchInfo(0,1048576),[top8_feat_users,6] ->
> > PartitionFetchInfo(0,1048576),[http_headers,29] ->
> > PartitionFetchInfo(0,1048576),[pets_cashruns_purchase,24] ->
> > PartitionFetchInfo(0,1048576),[payment,4] ->
> > PartitionFetchInfo(0,1048576),[talk_recommend_click,13] ->
> > PartitionFetchInfo(0,1048576),[mob_help,3] ->
> > PartitionFetchInfo(0,1048576),[auto_cancel,6] ->
> > PartitionFetchInfo(0,1048576),[bouncemail,32] ->
> > PartitionFetchInfo(0,1048576),[user_support,23] ->
> > PartitionFetchInfo(0,1048576),[groups_search,34] ->
> > PartitionFetchInfo(0,1048576),[pets_cashruns_spin,16] ->
> > PartitionFetchInfo(0,1048576),[trialpay_notification,21] ->
> > PartitionFetchInfo(0,1048576),[lightbox_view,6] ->
> > PartitionFetchInfo(0,1048576),[gwallet_notification,21] ->
> > PartitionFetchInfo(0,1048576),[cafe_unspoils,20] ->
> > PartitionFetchInfo(0,1048576),[pets_cashflow,9] ->
> > PartitionFetchInfo(0,1048576),[promotions_targeted_actions,34] ->
> > PartitionFetchInfo(0,1048576),[chat_login,4] ->
> > PartitionFetchInfo(0,1048576),[google_transactions,28] ->
> > PartitionFetchInfo(0,1048576),[meetme_new_contact_count,11] ->
> > PartitionFetchInfo(0,1048576),[meetme_spotlight_purchase,28] ->
> > PartitionFetchInfo(0,1048576),[validation_email_change,24] ->
> > PartitionFetchInfo(0,1048576),[meetme_oops,30] ->
> > PartitionFetchInfo(0,1048576),[pets_reset_achievement,18] ->
> > PartitionFetchInfo(0,1048576),[hi5_blast_photocount,32] ->
> > PartitionFetchInfo(0,1048576),[security_trigger,5] ->
> > PartitionFetchInfo(0,1048576),[pets_cashruns_winners,32] ->
> > PartitionFetchInfo(0,1048576),[validation,24] ->
> > PartitionFetchInfo(0,1048576),[lightbox_click,31] ->
> > PartitionFetchInfo(0,1048576),[viral_landing_page,7] ->
> > PartitionFetchInfo(0,1048576),[mobile_performance,26] ->
> > PartitionFetchInfo(0,1048576),[viral_reg_pv,28] ->
> > PartitionFetchInfo(0,1048576),[statictiming,8] ->
> > PartitionFetchInfo(0,1048576),[hi5_blast_photocount,20] ->
> > PartitionFetchInfo(0,1048576),[mobile_logout,12] ->
> > PartitionFetchInfo(0,1048576),[validation_lightbox_action,5] ->
> > PartitionFetchInfo(0,1048576),[cafe_food_served,10] ->
> > PartitionFetchInfo(0,1048576),[content_checkboxes,4] ->
> > PartitionFetchInfo(0,1048576),[meetme_matches_tab,27] ->
> > PartitionFetchInfo(0,1048576),[pets_cashruns_purchase,27] ->
> > PartitionFetchInfo(0,1048576),[auto_cancel,27] ->
> > PartitionFetchInfo(0,1048576),[pets_reset_achievement,9] ->
> > PartitionFetchInfo(0,1048576),[m3_login_button,9] ->
> > PartitionFetchInfo(0,1048576),[chat_login,16] ->
> > PartitionFetchInfo(0,1048576),[rapleaf_inter,13] ->
> > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,6] ->
> > PartitionFetchInfo(0,1048576),[link_review,1] ->
> > PartitionFetchInfo(0,1048576),[itunes_notification,6] ->
> > PartitionFetchInfo(0,1048576),[m3_session_info,24] ->
> > PartitionFetchInfo(0,1048576),[account_notifications,23] ->
> > PartitionFetchInfo(0,1048576),[photoviewer_message_click,16] ->
> > PartitionFetchInfo(0,1048576),[pets_economy_change,18] ->
> > PartitionFetchInfo(0,1048576),[cafe_coin_purchases,9] ->
> > PartitionFetchInfo(0,1048576),[virtual_gift,3] ->
> > PartitionFetchInfo(0,1048576),[complains,8] ->
> > PartitionFetchInfo(0,1048576),[pets_purchased,20] ->
> > PartitionFetchInfo(0,1048576),[cafe_waiter_tips,18] ->
> > PartitionFetchInfo(0,1048576),[rso_scanner_append,23] ->
> > PartitionFetchInfo(0,1048576),[viral_contact_importers,4] ->
> > PartitionFetchInfo(0,1048576),[viral_contact_inviters,17] ->
> > PartitionFetchInfo(0,1048576),[meetme_matches_tab,21] ->
> > PartitionFetchInfo(0,1048576),[viral_invite,31] ->
> > PartitionFetchInfo(0,1048576),[page_view_api,13] ->
> > PartitionFetchInfo(1143,1048576),[mm_msg,17] ->
> > PartitionFetchInfo(0,1048576),[cancel,22] ->
> > PartitionFetchInfo(0,1048576),[mobile_performance,20] ->
> > PartitionFetchInfo(0,1048576),[set_primary_photo,22] ->
> > PartitionFetchInfo(0,1048576),[registration_email_domain_scan,10] ->
> > PartitionFetchInfo(0,1048576),[mob_properties,28] ->
> > PartitionFetchInfo(0,1048576),[viral_registration,22] ->
> > PartitionFetchInfo(0,1048576),[user_contact,9] ->
> > PartitionFetchInfo(0,1048576),[rapleaf_image,2] ->
> > PartitionFetchInfo(0,1048576),[pets_cashruns_spin,25] ->
> > PartitionFetchInfo(0,1048576),[cafe_debug,7] ->
> > PartitionFetchInfo(0,1048576),[mobile_app_activities,8] ->
> > PartitionFetchInfo(0,1048576),[cafe_food_spoiled,20] ->
> > PartitionFetchInfo(0,1048576),[validation,9] ->
> > PartitionFetchInfo(0,1048576),[pets_achievements,15] ->
> > PartitionFetchInfo(0,1048576),[gwallet_notification,3] ->
> > PartitionFetchInfo(0,1048576),[index_pageview,33] ->
> > PartitionFetchInfo(0,1048576),[security_trigger,23] ->
> > PartitionFetchInfo(0,1048576),[link_render,22] ->
> > PartitionFetchInfo(0,1048576),[lightbox_click,19] ->
> > PartitionFetchInfo(0,1048576),[cafe_waiter_tips,30] ->
> > PartitionFetchInfo(0,1048576),[questions,13] ->
> > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,15] ->
> > PartitionFetchInfo(0,1048576),[phoneflow,31] ->
> > PartitionFetchInfo(0,1048576),[pets_buyback_lightbox,18] ->
> > PartitionFetchInfo(0,1048576),[mob_premium,6] ->
> > PartitionFetchInfo(0,1048576),[cafe_unspoils,5] ->
> > PartitionFetchInfo(0,1048576),[pets_cashruns_spin,34] ->
> > PartitionFetchInfo(0,1048576),[viral_reg_pv,16] ->
> > PartitionFetchInfo(0,1048576),[http_headers,23] ->
> > PartitionFetchInfo(0,1048576),[viral_landing_page,10] ->
> > PartitionFetchInfo(0,1048576),[mob_properties,13] ->
> > PartitionFetchInfo(0,1048576),[mobile_api,13] ->
> > PartitionFetchInfo(0,1048576),[pets_purchased,32] ->
> > PartitionFetchInfo(0,1048576),[pets_cashruns_bid,22] ->
> > PartitionFetchInfo(0,1048576),[photo_view,22] ->
> > PartitionFetchInfo(0,1048576),[viral_import_conf_pv,18] ->
> > PartitionFetchInfo(0,1048576),[mob_viral_job,28] ->
> > PartitionFetchInfo(0,1048576),[newsfeed_filter_change,22] ->
> > PartitionFetchInfo(0,1048576),[password_change,12] ->
> > PartitionFetchInfo(0,1048576),[ent_wt_winners,23] ->
> > PartitionFetchInfo(0,1048576),[m3_account_cancel,15] ->
> > PartitionFetchInfo(0,1048576),[meetme_auto_message_send,6] ->
> > PartitionFetchInfo(0,1048576),[pets_cashflow,27] ->
> > PartitionFetchInfo(0,1048576),[rapleaf_inter,31] ->
> > PartitionFetchInfo(0,1048576),[user_contact,33] ->
> > PartitionFetchInfo(0,1048576),[validation_email_change,0] ->
> > PartitionFetchInfo(0,1048576),[pets_classifier_pysb,18] ->
> > PartitionFetchInfo(0,1048576),[http_headers,14] ->
> > PartitionFetchInfo(0,1048576),[meetme_matches_tab,30] ->
> > PartitionFetchInfo(0,1048576),[m3_user_block,23] ->
> > PartitionFetchInfo(0,1048576),[mob_premium,9] ->
> > PartitionFetchInfo(0,1048576),[page_view_api,19] ->
> > PartitionFetchInfo(1125,1048576),[photoviewer_like,15] ->
> > PartitionFetchInfo(0,1048576),[meetme_oops,27] ->
> > PartitionFetchInfo(0,1048576),[photoviewer_message_click,1] ->
> > PartitionFetchInfo(0,1048576),[page_view_admin,27] ->
> > PartitionFetchInfo(0,1048576),[cafe_waiters_quit,6] ->
> > PartitionFetchInfo(0,1048576),[mob_premium,15] ->
> > PartitionFetchInfo(0,1048576),[itunes_notification,15] ->
> > PartitionFetchInfo(0,1048576),[phoneflow,22] ->
> > PartitionFetchInfo(0,1048576),[auto_cancel,30] ->
> > PartitionFetchInfo(0,1048576),[skins_library,33] ->
> > PartitionFetchInfo(0,1048576),[hi5_blast_photocount,2] ->
> > PartitionFetchInfo(0,1048576),[security_trigger,8] ->
> > PartitionFetchInfo(0,1048576),[test_assign,0] ->
> > PartitionFetchInfo(0,1048576),[test_assign,9] ->
> > PartitionFetchInfo(0,1048576),[cafe_badge,3] ->
> > PartitionFetchInfo(0,1048576),[mob_properties,31] ->
> > PartitionFetchInfo(0,1048576),[pets_classifier_bp,7] ->
> > PartitionFetchInfo(0,1048576),[chat_login,13] ->
> > PartitionFetchInfo(0,1048576),[cafe_waiters_hire,18] ->
> > PartitionFetchInfo(0,1048576),[click_mail,7] ->
> > PartitionFetchInfo(0,1048576),[mob_gift,14] ->
> > PartitionFetchInfo(0,1048576),[password_change,3] ->
> > PartitionFetchInfo(0,1048576),[mm_msg,8] ->
> > PartitionFetchInfo(0,1048576),[tagline,1] ->
> > PartitionFetchInfo(0,1048576),[meetme_oops,21] ->
> > PartitionFetchInfo(0,1048576),[pets_cashflow,18] ->
> > PartitionFetchInfo(0,1048576),[newsfeed_likes,22] ->
> > PartitionFetchInfo(0,1048576),[video_shares,33] ->
> > PartitionFetchInfo(0,1048576),[pets_classifier_test,28] ->
> > PartitionFetchInfo(0,1048576),[itunes_notification,33] ->
> > PartitionFetchInfo(0,1048576),[trialpay_notification,24] ->
> > PartitionFetchInfo(0,1048576),[viral_import,28] ->
> > PartitionFetchInfo(0,1048576),[chat_guest,4] ->
> > PartitionFetchInfo(0,1048576),[registration_mobile,35] ->
> > PartitionFetchInfo(0,1048576),[sponsorpay_notification,20] ->
> > PartitionFetchInfo(0,1048576),[tagline,16] ->
> > PartitionFetchInfo(0,1048576),[auto_cancel,15] ->
> > PartitionFetchInfo(0,1048576),[pets_level,0] ->
> > PartitionFetchInfo(0,1048576),[security_trigger,20] ->
> > PartitionFetchInfo(0,1048576),[pets_cashruns_bid,7] ->
> > PartitionFetchInfo(0,1048576),[statictiming,29] ->
> > PartitionFetchInfo(0,1048576),[meetme_auto_message_send,21] ->
> > PartitionFetchInfo(0,1048576),[pets_cashruns_pageview,29] ->
> > PartitionFetchInfo(0,1048576),[newsfeed_likes,13] ->
> > PartitionFetchInfo(0,1048576),[page_view_admin,24] ->
> > PartitionFetchInfo(0,1048576),[cafe_waiters_hire,30] ->
> > PartitionFetchInfo(0,1048576),[matches_bar_view,35] ->
> > PartitionFetchInfo(0,1048576),[pets_pageview,31] ->
> > PartitionFetchInfo(0,1048576),[external_url,15] ->
> > PartitionFetchInfo(0,1048576),[newsfeed_posts,34] ->
> > PartitionFetchInfo(0,1048576),[viral_import,13] ->
> > PartitionFetchInfo(0,1048576),[pets_classifier_bp,22] ->
> > PartitionFetchInfo(0,1048576),[virtual_gift,6] ->
> > PartitionFetchInfo(0,1048576),[rapleaf_image,17] ->
> > PartitionFetchInfo(0,1048576),[cafe_goals,28] ->
> > PartitionFetchInfo(0,1048576),[language_reg,35] ->
> > PartitionFetchInfo(0,1048576),[pets_nec_view,12] ->
> > PartitionFetchInfo(0,1048576),[pets_economy_change,3] ->
> > PartitionFetchInfo(0,1048576),[pets_purchased,26] ->
> > PartitionFetchInfo(0,1048576),[mob_properties,10] ->
> > PartitionFetchInfo(0,1048576),[mob_gift,26] ->
> > PartitionFetchInfo(0,1048576),[registration_email_domain_scan,13] ->
> > PartitionFetchInfo(0,1048576),[viral_import_pv,22] ->
> > PartitionFetchInfo(0,1048576),[photo_view,28] ->
> > PartitionFetchInfo(0,1048576),[videos,10] ->
> > PartitionFetchInfo(0,1048576),[cafe_coin_purchases,15] ->
> > PartitionFetchInfo(0,1048576),[phoneflow,19] ->
> > PartitionFetchInfo(0,1048576),[cafe_debug,19] ->
> > PartitionFetchInfo(0,1048576),[pets_level,3] ->
> > PartitionFetchInfo(0,1048576),[mob_gift,23] ->
> > PartitionFetchInfo(0,1048576),[pets_reset_achievement,27] ->
> > PartitionFetchInfo(0,1048576),[m3_profile_views,2] ->
> > PartitionFetchInfo(0,1048576),[bouncemail,5] ->
> > PartitionFetchInfo(0,1048576),[mob_properties,4] ->
> > PartitionFetchInfo(0,1048576),[content_review_preprocess,28] ->
> > PartitionFetchInfo(0,1048576),[pysm_click,3] ->
> > PartitionFetchInfo(0,1048576),[m3_auth,23] ->
> > PartitionFetchInfo(0,1048576),[friend_request,2] ->
> > PartitionFetchInfo(0,1048576),[photoviewer_message_click,4] ->
> > PartitionFetchInfo(0,1048576),[push_notifications,19] ->
> > PartitionFetchInfo(0,1048576),[mob_spend_ap,8] ->
> > PartitionFetchInfo(0,1048576),[purchase_actions,3] ->
> > PartitionFetchInfo(0,1048576),[promotions_targeted_actions,13] ->
> > PartitionFetchInfo(0,1048576),[sponsorpay_notification,23] ->
> > PartitionFetchInfo(0,1048576),[cafe_unspoils,8] ->
> > PartitionFetchInfo(0,1048576),[meetme_spotlight_action,3] ->
> > PartitionFetchInfo(0,1048576),[viral_registration,7] ->
> > PartitionFetchInfo(0,1048576),[complains,20] ->
> > PartitionFetchInfo(0,1048576),[hi5_image_cleanup,18] ->
> > PartitionFetchInfo(0,1048576),[sponsorpay_notification,5] ->
> > PartitionFetchInfo(0,1048576),[pets_nec_view,0] ->
> > PartitionFetchInfo(0,1048576),[chat_guest,16] ->
> > PartitionFetchInfo(0,1048576),[promo_toast_view,1] ->
> > PartitionFetchInfo(0,1048576),[cancel,25] ->
> > PartitionFetchInfo(0,1048576),[jstiming,8] ->
> > PartitionFetchInfo(0,1048576),[pets_wish,12] ->
> > PartitionFetchInfo(0,1048576),[rapleaf_image,5] ->
> > PartitionFetchInfo(0,1048576),[language_reg,29] ->
> > PartitionFetchInfo(0,1048576),[long_query,28] ->
> > PartitionFetchInfo(0,1048576),[pets_cashflow,6] ->
> > PartitionFetchInfo(0,1048576),[mobile_click,28] ->
> > PartitionFetchInfo(0,1048576),[mob_help,18] ->
> > PartitionFetchInfo(0,1048576),[talk_recommend_click,22] ->
> > PartitionFetchInfo(0,1048576),[set_primary_photo,16] ->
> > PartitionFetchInfo(0,1048576),[pets_browse,3] -> Part
> > java.net.SocketTimeoutException
> >         at
> > sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:201)
> >         at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86)
> >         at
> >
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:221)
> >         at kafka.utils.Utils$.read(Utils.scala:394)
> >         at
> >
> >
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
> >         at
> > kafka.network.Receive$class.readCompletely(Transmission.scala:56)
> >         at
> >
> >
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
> >         at
> kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
> >         at
> > kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:81)
> >         at
> >
> >
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
> >         at
> >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110)
> >         at
> >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
> >         at
> >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
> >         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> >         at
> >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:109)
> >         at
> >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
> >         at
> >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
> >         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> >         at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:108)
> >         at
> >
> >
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96)
> >         at
> > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
> >         at
> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
> >
> >
> > [2013-08-26 12:38:18,854] INFO [ReplicaFetcherThread--1-6], Stopped
> >  (kafka.server.ReplicaFetcherThread)
> > [2013-08-26 12:38:18,854] INFO [ReplicaFetcherThread--1-6], Shutdown
> > completed (kafka.server.ReplicaFetcherThread)
> > [2013-08-26 12:38:18,858] INFO [ReplicaFetcherThread-0-6], Shutting down
> > (kafka.server.ReplicaFetcherThread)
> > [2013-08-26 12:38:23,821] INFO Closing socket connection to /10.15.11.24
> .
> > (kafka.network.Processor)
> > [2013-08-26 12:38:23,899] INFO [ReplicaFetcherThread-0-6], Stopped
> >  (kafka.server.ReplicaFetcherThread)
> > [2013-08-26 12:38:23,899] INFO [ReplicaFetcherThread-0-6], Shutdown
> > completed (kafka.server.ReplicaFetcherThread)
> > [2013-08-26 12:38:23,899] INFO [ReplicaFetcherThread-2-6], Shutting down
> > (kafka.server.ReplicaFetcherThread)
> > [2013-08-26 12:38:24,064] INFO [ReplicaFetcherThread-2-6], Stopped
> >  (kafka.server.ReplicaFetcherThread)
> > [2013-08-26 12:38:24,064] INFO [ReplicaFetcherThread-2-6], Shutdown
> > completed (kafka.server.ReplicaFetcherThread)
> >
> >
> >
> > On Tue, Aug 27, 2013 at 10:05 AM, Neha Narkhede <neha.narkhede@gmail.com
> > >wrote:
> >
> > > When you said you tried to shutdown the broker, did you try controlled
> > > shutdown? Do you see "Shutting down" in the logs of the broker that
> > seemed
> > > to hang?
> > >
> > > Thanks,
> > > Neha
> > >
> > >
> > > On Tue, Aug 27, 2013 at 9:12 AM, Vadim Keylis <vk...@gmail.com>
> > > wrote:
> > >
> > > > No. They actually were stuck. Not responding to shutdown request. I
> had
> > > to
> > > > kill them with kill -9 command. I try to take heap dump which hang as
> > > well.
> > > >
> > > > Sent from my iPhone
> > > >
> > > > On Aug 27, 2013, at 8:14 AM, Jun Rao <ju...@gmail.com> wrote:
> > > >
> > > > > The errors you listed may not be serious, as long as they are
> > > transient.
> > > > > When you say 2 of the brokers are not responsive, are they issuing
> > > fetch
> > > > > requests to the 3rd broker (look at the request log)? During a
> > restart
> > > of
> > > > > the whole cluster, brokers that are started later may not have any
> > > leader
> > > > > and thus won't take any request from the client. You will need to
> run
> > > the
> > > > > leader balance tool.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Mon, Aug 26, 2013 at 10:12 PM, Vadim Keylis <
> > vkeylis2009@gmail.com
> > > > >wrote:
> > > > >
> > > > >> Somehow I am getting my instances of kafka to crash. I started
> kafka
> > > > >> instances one by one and they started successfully. Later it some
> > how
> > > > two
> > > > >> of 3 instances became completely unresponsive. The process is
> > running,
> > > > but
> > > > >> connnection over jmx or taking heat dump not possible. The last
> one
> > > some
> > > > >> what resposnive.
> > > > >> I am not sure how server get to this state. Is there anything I
> can
> > > > monitor
> > > > >> to predict instances about to crash. What are ways to recover
> > without
> > > > data
> > > > >> loss? What am I doing wrong to get to this state. Please advise.
> > > > >> I poke around error logs on hosts that are not responsive and here
> > are
> > > > the
> > > > >> errors I found. One that I have not listed
> LeaderNotFoundExceotion.
> > > > >>
> > > > >> The most puzzling is about zookeeper as it was not redeployed or
> > > > updated.
> > > > >> [2013-08-26 12:14:35,357] ERROR [KafkaApi-5] Error while fetching
> > > > metadata
> > > > >> for partition [self_reactivation,0] (kafka.server.KafkaApis)
> > > > >> kafka.common.ReplicaNotAvailableException
> > > > >>        at
> > > > >>
> > > >
> > kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:471)
> > > > >>        at
> > > > >>
> > > >
> > kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:456)
> > > > >>        at
> > > > >>
> > > > >>
> > > >
> > >
> >
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
> > > > >>        at
> > > > >>
> > > > >>
> > > >
> > >
> >
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
> > > > >>        at
> > > > >>
> > > > >>
> > > >
> > >
> >
> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
> > > > >>        at scala.collection.immutable.List.foreach(List.scala:76)
> > > > >>        at
> > > > >>
> > scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
> > > > >>
> > > > >>
> > > > >> in server.log
> > > > >> [2013-08-26 21:00:51,942] ERROR Conditional update of path
> > > > >> /brokers/topics/meetme/partitions/12/state with data {
> > > > >> "controller_epoch":6, "isr":[ 5 ], "leader":5, "leader_epoch":1,
> > > > >> "version":1 } and expected version 2 failed due to
> > > > >> org.apache.zookeeper.KeeperException$BadVersionException:
> > > > KeeperErrorCode =
> > > > >> BadVersion for /brokers/topics/meetme/partitions/12/state
> > > > >> (kafka.utils.ZkUtils$)
> > > > >> [2013-08-26 21:00:51,943] INFO Partition [meetme,12] on broker 5:
> > > Cached
> > > > >> zkVersion [2] not equal to that in zookeeper, skip updating ISR
> > > > >> (kafka.cluster.Partition)
> > > > >> [2013-08-26 21:00:51,990] INFO Partition [meetme,4] on broker 5:
> > > > Shrinking
> > > > >> ISR for partition [meetme,4] from 5,4 to 5
> (kafka.cluster.Partition)
> > > > >> [2013-08-26 21:00:51,993] ERROR Conditional update of path
> > > > >> /brokers/topics/meetme/partitions/4/state with data {
> > > > "controller_epoch":6,
> > > > >> "isr":[ 5 ], "leader":5, "leader_epoch":1, "version":1 } and
> > expected
> > > > >> version 2 failed due to
> > > > >> org.apache.zookeeper.KeeperException$BadVersionException:
> > > > KeeperErrorCode =
> > > > >> BadVersion for /brokers/topics/meetme/partitions/4/state
> > > > >> (kafka.utils.ZkUtils$)
> > > > >> [2013-08-26 21:00:51,993] INFO Partition [meetme,4] on broker 5:
> > > Cached
> > > > >> zkVersion [2] not equal to that in zookeeper, skip updating ISR
> > > > >> (kafka.cluster.Partition)
> > > > >> [2013-08-26 21:00:52,103] INFO Partition [meetme,6] on broker 5:
> > > > Shrinking
> > > > >> ISR for partition [meetme,6] from 5,4 to 5
> (kafka.cluster.Partition)
> > > > >> [2013-08-26 21:00:52,107] ERROR Conditional update of path
> > > > >> /brokers/topics/meetme/partitions/6/state with data {
> > > > "controller_epoch":6,
> > > > >> "isr":[ 5 ], "leader":5, "leader_epoch":2, "version":1 } and
> > expected
> > > > >> version 3 failed due to
> > > > >> org.apache.zookeeper.KeeperException$BadVersionException:
> > > > KeeperErrorCode =
> > > > >> BadVersion for /brokers/topics/meetme/partitions/6/state
> > > > >> (kafka.utils.ZkUtils$)
> > > > >> [2013-08-26 21:00:52,107] INFO Partition [meetme,6] on broker 5:
> > > Cached
> > > > >> zkVersion [3] not equal to that in zookeeper, skip updating ISR
> > > > >> (kafka.cluster.Partition)
> > > > >>
> > > >
> > >
> >
>

Re: Instances became unresponsive

Posted by Jun Rao <ju...@gmail.com>.
It seems the replica fetch thread died because of socket timeout (defaults
to 30 secs). Was broker 6 healthy at that point?

Thanks,

Jun


On Tue, Aug 27, 2013 at 11:36 AM, Vadim Keylis <vk...@gmail.com>wrote:

> We do not use controlled shutdown through JMX, its configured in the
> property file. I do not see control shutdown message at the time I
> initiated the shutdown. However searching for the string produced the
> following error messages which happened hours before I started shutting
> down service.
>
> [2013-08-26 12:38:18,850] WARN [ReplicaFetcherThread--1-6], Error in fetch
> Name: FetchRequest; Version: 0; CorrelationId: 1541; ClientId:
> ReplicaFetcherThread--1-6; ReplicaId: 5; MaxWait: 500 ms; MinBytes: 1
> bytes; RequestInfo: [pets_pageview,25] ->
> PartitionFetchInfo(0,1048576),[mm_msg,26] ->
> PartitionFetchInfo(0,1048576),[pets_cashruns_spin,22] ->
> PartitionFetchInfo(0,1048576),[page_timings,5] ->
> PartitionFetchInfo(0,1048576),[cafe_purchases,13] ->
> PartitionFetchInfo(0,1048576),[meetme_spotlight_action,33] ->
> PartitionFetchInfo(0,1048576),[mob_gift,5] ->
> PartitionFetchInfo(0,1048576),[cafe_coin_purchases,18] ->
> PartitionFetchInfo(0,1048576),[security_trigger,11] ->
> PartitionFetchInfo(0,1048576),[pysm_click,30] ->
> PartitionFetchInfo(0,1048576),[gold_blacklist,21] ->
> PartitionFetchInfo(0,1048576),[meetme_oops,9] ->
> PartitionFetchInfo(0,1048576),[m3_session_info,15] ->
> PartitionFetchInfo(0,1048576),[link_review,10] ->
> PartitionFetchInfo(0,1048576),[cafe_debug,28] ->
> PartitionFetchInfo(0,1048576),[m3_login_button,0] ->
> PartitionFetchInfo(0,1048576),[pets_level,24] ->
> PartitionFetchInfo(0,1048576),[login_detail,9] ->
> PartitionFetchInfo(0,1048576),[click_mail,31] ->
> PartitionFetchInfo(0,1048576),[pets_wish,15] ->
> PartitionFetchInfo(0,1048576),[page_view_admin,6] ->
> PartitionFetchInfo(0,1048576),[hi5_image_cleanup,9] ->
> PartitionFetchInfo(0,1048576),[pets_wish,24] ->
> PartitionFetchInfo(0,1048576),[cafe_food_spoiled,23] ->
> PartitionFetchInfo(0,1048576),[pets_wish,33] ->
> PartitionFetchInfo(0,1048576),[account_notifications,11] ->
> PartitionFetchInfo(0,1048576),[google_transactions,7] ->
> PartitionFetchInfo(0,1048576),[hi5_image_cleanup,3] ->
> PartitionFetchInfo(0,1048576),[pets_economy_change,15] ->
> PartitionFetchInfo(0,1048576),[payment,13] ->
> PartitionFetchInfo(0,1048576),[validation,0] ->
> PartitionFetchInfo(0,1048576),[meetme_new_contact_count,5] ->
> PartitionFetchInfo(0,1048576),[mail_send,18] ->
> PartitionFetchInfo(0,1048576),[lightbox_click,28] ->
> PartitionFetchInfo(0,1048576),[rso_scanner_append,5] ->
> PartitionFetchInfo(0,1048576),[mob_gift,2] ->
> PartitionFetchInfo(0,1048576),[cafe_waiter_tips,24] ->
> PartitionFetchInfo(0,1048576),[groups_user_actions,22] ->
> PartitionFetchInfo(0,1048576),[jstiming,23] ->
> PartitionFetchInfo(0,1048576),[viral_contact_inviters,26] ->
> PartitionFetchInfo(0,1048576),[meetme_matches_tab,24] ->
> PartitionFetchInfo(0,1048576),[promotions_targeted_actions,19] ->
> PartitionFetchInfo(0,1048576),[reg_check,15] ->
> PartitionFetchInfo(0,1048576),[cafe_waiters_quit,9] ->
> PartitionFetchInfo(0,1048576),[groups_search,28] ->
> PartitionFetchInfo(0,1048576),[meetme_auto_message_send,15] ->
> PartitionFetchInfo(0,1048576),[virtual_gift,21] ->
> PartitionFetchInfo(0,1048576),[groupchat_room,21] ->
> PartitionFetchInfo(0,1048576),[rapleaf_inter,10] ->
> PartitionFetchInfo(0,1048576),[reg_check,18] ->
> PartitionFetchInfo(0,1048576),[viral_contact_inviters,29] ->
> PartitionFetchInfo(0,1048576),[mail_send,21] ->
> PartitionFetchInfo(0,1048576),[pets_economy_change,0] ->
> PartitionFetchInfo(0,1048576),[pets_classifier_test,19] ->
> PartitionFetchInfo(0,1048576),[cafe_goals,10] ->
> PartitionFetchInfo(0,1048576),[pets_browse,9] ->
> PartitionFetchInfo(0,1048576),[pets_achievements,24] ->
> PartitionFetchInfo(0,1048576),[content_review_preprocess,31] ->
> PartitionFetchInfo(0,1048576),[hi5_image_cleanup,12] ->
> PartitionFetchInfo(0,1048576),[link_render,16] ->
> PartitionFetchInfo(0,1048576),[ent_wt_winners,2] ->
> PartitionFetchInfo(0,1048576),[meetme,23] ->
> PartitionFetchInfo(1178,1048576),[gwallet_notification,30] ->
> PartitionFetchInfo(0,1048576),[m3_login_button,18] ->
> PartitionFetchInfo(0,1048576),[mobile_logout,3] ->
> PartitionFetchInfo(0,1048576),[trialpay_notification,30] ->
> PartitionFetchInfo(0,1048576),[sponsorpay_notification,35] ->
> PartitionFetchInfo(0,1048576),[groupchat_room,6] ->
> PartitionFetchInfo(0,1048576),[pets_reload,30] ->
> PartitionFetchInfo(0,1048576),[report_source,1] ->
> PartitionFetchInfo(0,1048576),[pets_nec_buygold,21] ->
> PartitionFetchInfo(0,1048576),[set_primary_photo,13] ->
> PartitionFetchInfo(0,1048576),[ent_wt_lb_sort,2] ->
> PartitionFetchInfo(0,1048576),[video_shares,24] ->
> PartitionFetchInfo(0,1048576),[viral_contact_importers,13] ->
> PartitionFetchInfo(0,1048576),[click_mail,13] ->
> PartitionFetchInfo(0,1048576),[pets_achievements,6] ->
> PartitionFetchInfo(0,1048576),[newsfeed_likes,10] ->
> PartitionFetchInfo(0,1048576),[pets_achievements,12] ->
> PartitionFetchInfo(0,1048576),[registration_email_domain_scan,31] ->
> PartitionFetchInfo(0,1048576),[photoviewer_message_click,19] ->
> PartitionFetchInfo(0,1048576),[registration_email_domain_scan,1] ->
> PartitionFetchInfo(0,1048576),[pets_cashruns_conf,6] ->
> PartitionFetchInfo(0,1048576),[meetme_headliners_purchase,28] ->
> PartitionFetchInfo(0,1048576),[pets_cashflow,33] ->
> PartitionFetchInfo(0,1048576),[top8_feat_users,6] ->
> PartitionFetchInfo(0,1048576),[http_headers,29] ->
> PartitionFetchInfo(0,1048576),[pets_cashruns_purchase,24] ->
> PartitionFetchInfo(0,1048576),[payment,4] ->
> PartitionFetchInfo(0,1048576),[talk_recommend_click,13] ->
> PartitionFetchInfo(0,1048576),[mob_help,3] ->
> PartitionFetchInfo(0,1048576),[auto_cancel,6] ->
> PartitionFetchInfo(0,1048576),[bouncemail,32] ->
> PartitionFetchInfo(0,1048576),[user_support,23] ->
> PartitionFetchInfo(0,1048576),[groups_search,34] ->
> PartitionFetchInfo(0,1048576),[pets_cashruns_spin,16] ->
> PartitionFetchInfo(0,1048576),[trialpay_notification,21] ->
> PartitionFetchInfo(0,1048576),[lightbox_view,6] ->
> PartitionFetchInfo(0,1048576),[gwallet_notification,21] ->
> PartitionFetchInfo(0,1048576),[cafe_unspoils,20] ->
> PartitionFetchInfo(0,1048576),[pets_cashflow,9] ->
> PartitionFetchInfo(0,1048576),[promotions_targeted_actions,34] ->
> PartitionFetchInfo(0,1048576),[chat_login,4] ->
> PartitionFetchInfo(0,1048576),[google_transactions,28] ->
> PartitionFetchInfo(0,1048576),[meetme_new_contact_count,11] ->
> PartitionFetchInfo(0,1048576),[meetme_spotlight_purchase,28] ->
> PartitionFetchInfo(0,1048576),[validation_email_change,24] ->
> PartitionFetchInfo(0,1048576),[meetme_oops,30] ->
> PartitionFetchInfo(0,1048576),[pets_reset_achievement,18] ->
> PartitionFetchInfo(0,1048576),[hi5_blast_photocount,32] ->
> PartitionFetchInfo(0,1048576),[security_trigger,5] ->
> PartitionFetchInfo(0,1048576),[pets_cashruns_winners,32] ->
> PartitionFetchInfo(0,1048576),[validation,24] ->
> PartitionFetchInfo(0,1048576),[lightbox_click,31] ->
> PartitionFetchInfo(0,1048576),[viral_landing_page,7] ->
> PartitionFetchInfo(0,1048576),[mobile_performance,26] ->
> PartitionFetchInfo(0,1048576),[viral_reg_pv,28] ->
> PartitionFetchInfo(0,1048576),[statictiming,8] ->
> PartitionFetchInfo(0,1048576),[hi5_blast_photocount,20] ->
> PartitionFetchInfo(0,1048576),[mobile_logout,12] ->
> PartitionFetchInfo(0,1048576),[validation_lightbox_action,5] ->
> PartitionFetchInfo(0,1048576),[cafe_food_served,10] ->
> PartitionFetchInfo(0,1048576),[content_checkboxes,4] ->
> PartitionFetchInfo(0,1048576),[meetme_matches_tab,27] ->
> PartitionFetchInfo(0,1048576),[pets_cashruns_purchase,27] ->
> PartitionFetchInfo(0,1048576),[auto_cancel,27] ->
> PartitionFetchInfo(0,1048576),[pets_reset_achievement,9] ->
> PartitionFetchInfo(0,1048576),[m3_login_button,9] ->
> PartitionFetchInfo(0,1048576),[chat_login,16] ->
> PartitionFetchInfo(0,1048576),[rapleaf_inter,13] ->
> PartitionFetchInfo(0,1048576),[hi5_image_cleanup,6] ->
> PartitionFetchInfo(0,1048576),[link_review,1] ->
> PartitionFetchInfo(0,1048576),[itunes_notification,6] ->
> PartitionFetchInfo(0,1048576),[m3_session_info,24] ->
> PartitionFetchInfo(0,1048576),[account_notifications,23] ->
> PartitionFetchInfo(0,1048576),[photoviewer_message_click,16] ->
> PartitionFetchInfo(0,1048576),[pets_economy_change,18] ->
> PartitionFetchInfo(0,1048576),[cafe_coin_purchases,9] ->
> PartitionFetchInfo(0,1048576),[virtual_gift,3] ->
> PartitionFetchInfo(0,1048576),[complains,8] ->
> PartitionFetchInfo(0,1048576),[pets_purchased,20] ->
> PartitionFetchInfo(0,1048576),[cafe_waiter_tips,18] ->
> PartitionFetchInfo(0,1048576),[rso_scanner_append,23] ->
> PartitionFetchInfo(0,1048576),[viral_contact_importers,4] ->
> PartitionFetchInfo(0,1048576),[viral_contact_inviters,17] ->
> PartitionFetchInfo(0,1048576),[meetme_matches_tab,21] ->
> PartitionFetchInfo(0,1048576),[viral_invite,31] ->
> PartitionFetchInfo(0,1048576),[page_view_api,13] ->
> PartitionFetchInfo(1143,1048576),[mm_msg,17] ->
> PartitionFetchInfo(0,1048576),[cancel,22] ->
> PartitionFetchInfo(0,1048576),[mobile_performance,20] ->
> PartitionFetchInfo(0,1048576),[set_primary_photo,22] ->
> PartitionFetchInfo(0,1048576),[registration_email_domain_scan,10] ->
> PartitionFetchInfo(0,1048576),[mob_properties,28] ->
> PartitionFetchInfo(0,1048576),[viral_registration,22] ->
> PartitionFetchInfo(0,1048576),[user_contact,9] ->
> PartitionFetchInfo(0,1048576),[rapleaf_image,2] ->
> PartitionFetchInfo(0,1048576),[pets_cashruns_spin,25] ->
> PartitionFetchInfo(0,1048576),[cafe_debug,7] ->
> PartitionFetchInfo(0,1048576),[mobile_app_activities,8] ->
> PartitionFetchInfo(0,1048576),[cafe_food_spoiled,20] ->
> PartitionFetchInfo(0,1048576),[validation,9] ->
> PartitionFetchInfo(0,1048576),[pets_achievements,15] ->
> PartitionFetchInfo(0,1048576),[gwallet_notification,3] ->
> PartitionFetchInfo(0,1048576),[index_pageview,33] ->
> PartitionFetchInfo(0,1048576),[security_trigger,23] ->
> PartitionFetchInfo(0,1048576),[link_render,22] ->
> PartitionFetchInfo(0,1048576),[lightbox_click,19] ->
> PartitionFetchInfo(0,1048576),[cafe_waiter_tips,30] ->
> PartitionFetchInfo(0,1048576),[questions,13] ->
> PartitionFetchInfo(0,1048576),[hi5_image_cleanup,15] ->
> PartitionFetchInfo(0,1048576),[phoneflow,31] ->
> PartitionFetchInfo(0,1048576),[pets_buyback_lightbox,18] ->
> PartitionFetchInfo(0,1048576),[mob_premium,6] ->
> PartitionFetchInfo(0,1048576),[cafe_unspoils,5] ->
> PartitionFetchInfo(0,1048576),[pets_cashruns_spin,34] ->
> PartitionFetchInfo(0,1048576),[viral_reg_pv,16] ->
> PartitionFetchInfo(0,1048576),[http_headers,23] ->
> PartitionFetchInfo(0,1048576),[viral_landing_page,10] ->
> PartitionFetchInfo(0,1048576),[mob_properties,13] ->
> PartitionFetchInfo(0,1048576),[mobile_api,13] ->
> PartitionFetchInfo(0,1048576),[pets_purchased,32] ->
> PartitionFetchInfo(0,1048576),[pets_cashruns_bid,22] ->
> PartitionFetchInfo(0,1048576),[photo_view,22] ->
> PartitionFetchInfo(0,1048576),[viral_import_conf_pv,18] ->
> PartitionFetchInfo(0,1048576),[mob_viral_job,28] ->
> PartitionFetchInfo(0,1048576),[newsfeed_filter_change,22] ->
> PartitionFetchInfo(0,1048576),[password_change,12] ->
> PartitionFetchInfo(0,1048576),[ent_wt_winners,23] ->
> PartitionFetchInfo(0,1048576),[m3_account_cancel,15] ->
> PartitionFetchInfo(0,1048576),[meetme_auto_message_send,6] ->
> PartitionFetchInfo(0,1048576),[pets_cashflow,27] ->
> PartitionFetchInfo(0,1048576),[rapleaf_inter,31] ->
> PartitionFetchInfo(0,1048576),[user_contact,33] ->
> PartitionFetchInfo(0,1048576),[validation_email_change,0] ->
> PartitionFetchInfo(0,1048576),[pets_classifier_pysb,18] ->
> PartitionFetchInfo(0,1048576),[http_headers,14] ->
> PartitionFetchInfo(0,1048576),[meetme_matches_tab,30] ->
> PartitionFetchInfo(0,1048576),[m3_user_block,23] ->
> PartitionFetchInfo(0,1048576),[mob_premium,9] ->
> PartitionFetchInfo(0,1048576),[page_view_api,19] ->
> PartitionFetchInfo(1125,1048576),[photoviewer_like,15] ->
> PartitionFetchInfo(0,1048576),[meetme_oops,27] ->
> PartitionFetchInfo(0,1048576),[photoviewer_message_click,1] ->
> PartitionFetchInfo(0,1048576),[page_view_admin,27] ->
> PartitionFetchInfo(0,1048576),[cafe_waiters_quit,6] ->
> PartitionFetchInfo(0,1048576),[mob_premium,15] ->
> PartitionFetchInfo(0,1048576),[itunes_notification,15] ->
> PartitionFetchInfo(0,1048576),[phoneflow,22] ->
> PartitionFetchInfo(0,1048576),[auto_cancel,30] ->
> PartitionFetchInfo(0,1048576),[skins_library,33] ->
> PartitionFetchInfo(0,1048576),[hi5_blast_photocount,2] ->
> PartitionFetchInfo(0,1048576),[security_trigger,8] ->
> PartitionFetchInfo(0,1048576),[test_assign,0] ->
> PartitionFetchInfo(0,1048576),[test_assign,9] ->
> PartitionFetchInfo(0,1048576),[cafe_badge,3] ->
> PartitionFetchInfo(0,1048576),[mob_properties,31] ->
> PartitionFetchInfo(0,1048576),[pets_classifier_bp,7] ->
> PartitionFetchInfo(0,1048576),[chat_login,13] ->
> PartitionFetchInfo(0,1048576),[cafe_waiters_hire,18] ->
> PartitionFetchInfo(0,1048576),[click_mail,7] ->
> PartitionFetchInfo(0,1048576),[mob_gift,14] ->
> PartitionFetchInfo(0,1048576),[password_change,3] ->
> PartitionFetchInfo(0,1048576),[mm_msg,8] ->
> PartitionFetchInfo(0,1048576),[tagline,1] ->
> PartitionFetchInfo(0,1048576),[meetme_oops,21] ->
> PartitionFetchInfo(0,1048576),[pets_cashflow,18] ->
> PartitionFetchInfo(0,1048576),[newsfeed_likes,22] ->
> PartitionFetchInfo(0,1048576),[video_shares,33] ->
> PartitionFetchInfo(0,1048576),[pets_classifier_test,28] ->
> PartitionFetchInfo(0,1048576),[itunes_notification,33] ->
> PartitionFetchInfo(0,1048576),[trialpay_notification,24] ->
> PartitionFetchInfo(0,1048576),[viral_import,28] ->
> PartitionFetchInfo(0,1048576),[chat_guest,4] ->
> PartitionFetchInfo(0,1048576),[registration_mobile,35] ->
> PartitionFetchInfo(0,1048576),[sponsorpay_notification,20] ->
> PartitionFetchInfo(0,1048576),[tagline,16] ->
> PartitionFetchInfo(0,1048576),[auto_cancel,15] ->
> PartitionFetchInfo(0,1048576),[pets_level,0] ->
> PartitionFetchInfo(0,1048576),[security_trigger,20] ->
> PartitionFetchInfo(0,1048576),[pets_cashruns_bid,7] ->
> PartitionFetchInfo(0,1048576),[statictiming,29] ->
> PartitionFetchInfo(0,1048576),[meetme_auto_message_send,21] ->
> PartitionFetchInfo(0,1048576),[pets_cashruns_pageview,29] ->
> PartitionFetchInfo(0,1048576),[newsfeed_likes,13] ->
> PartitionFetchInfo(0,1048576),[page_view_admin,24] ->
> PartitionFetchInfo(0,1048576),[cafe_waiters_hire,30] ->
> PartitionFetchInfo(0,1048576),[matches_bar_view,35] ->
> PartitionFetchInfo(0,1048576),[pets_pageview,31] ->
> PartitionFetchInfo(0,1048576),[external_url,15] ->
> PartitionFetchInfo(0,1048576),[newsfeed_posts,34] ->
> PartitionFetchInfo(0,1048576),[viral_import,13] ->
> PartitionFetchInfo(0,1048576),[pets_classifier_bp,22] ->
> PartitionFetchInfo(0,1048576),[virtual_gift,6] ->
> PartitionFetchInfo(0,1048576),[rapleaf_image,17] ->
> PartitionFetchInfo(0,1048576),[cafe_goals,28] ->
> PartitionFetchInfo(0,1048576),[language_reg,35] ->
> PartitionFetchInfo(0,1048576),[pets_nec_view,12] ->
> PartitionFetchInfo(0,1048576),[pets_economy_change,3] ->
> PartitionFetchInfo(0,1048576),[pets_purchased,26] ->
> PartitionFetchInfo(0,1048576),[mob_properties,10] ->
> PartitionFetchInfo(0,1048576),[mob_gift,26] ->
> PartitionFetchInfo(0,1048576),[registration_email_domain_scan,13] ->
> PartitionFetchInfo(0,1048576),[viral_import_pv,22] ->
> PartitionFetchInfo(0,1048576),[photo_view,28] ->
> PartitionFetchInfo(0,1048576),[videos,10] ->
> PartitionFetchInfo(0,1048576),[cafe_coin_purchases,15] ->
> PartitionFetchInfo(0,1048576),[phoneflow,19] ->
> PartitionFetchInfo(0,1048576),[cafe_debug,19] ->
> PartitionFetchInfo(0,1048576),[pets_level,3] ->
> PartitionFetchInfo(0,1048576),[mob_gift,23] ->
> PartitionFetchInfo(0,1048576),[pets_reset_achievement,27] ->
> PartitionFetchInfo(0,1048576),[m3_profile_views,2] ->
> PartitionFetchInfo(0,1048576),[bouncemail,5] ->
> PartitionFetchInfo(0,1048576),[mob_properties,4] ->
> PartitionFetchInfo(0,1048576),[content_review_preprocess,28] ->
> PartitionFetchInfo(0,1048576),[pysm_click,3] ->
> PartitionFetchInfo(0,1048576),[m3_auth,23] ->
> PartitionFetchInfo(0,1048576),[friend_request,2] ->
> PartitionFetchInfo(0,1048576),[photoviewer_message_click,4] ->
> PartitionFetchInfo(0,1048576),[push_notifications,19] ->
> PartitionFetchInfo(0,1048576),[mob_spend_ap,8] ->
> PartitionFetchInfo(0,1048576),[purchase_actions,3] ->
> PartitionFetchInfo(0,1048576),[promotions_targeted_actions,13] ->
> PartitionFetchInfo(0,1048576),[sponsorpay_notification,23] ->
> PartitionFetchInfo(0,1048576),[cafe_unspoils,8] ->
> PartitionFetchInfo(0,1048576),[meetme_spotlight_action,3] ->
> PartitionFetchInfo(0,1048576),[viral_registration,7] ->
> PartitionFetchInfo(0,1048576),[complains,20] ->
> PartitionFetchInfo(0,1048576),[hi5_image_cleanup,18] ->
> PartitionFetchInfo(0,1048576),[sponsorpay_notification,5] ->
> PartitionFetchInfo(0,1048576),[pets_nec_view,0] ->
> PartitionFetchInfo(0,1048576),[chat_guest,16] ->
> PartitionFetchInfo(0,1048576),[promo_toast_view,1] ->
> PartitionFetchInfo(0,1048576),[cancel,25] ->
> PartitionFetchInfo(0,1048576),[jstiming,8] ->
> PartitionFetchInfo(0,1048576),[pets_wish,12] ->
> PartitionFetchInfo(0,1048576),[rapleaf_image,5] ->
> PartitionFetchInfo(0,1048576),[language_reg,29] ->
> PartitionFetchInfo(0,1048576),[long_query,28] ->
> PartitionFetchInfo(0,1048576),[pets_cashflow,6] ->
> PartitionFetchInfo(0,1048576),[mobile_click,28] ->
> PartitionFetchInfo(0,1048576),[mob_help,18] ->
> PartitionFetchInfo(0,1048576),[talk_recommend_click,22] ->
> PartitionFetchInfo(0,1048576),[set_primary_photo,16] ->
> PartitionFetchInfo(0,1048576),[pets_browse,3] -> Part
> java.net.SocketTimeoutException
>         at
> sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:201)
>         at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86)
>         at
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:221)
>         at kafka.utils.Utils$.read(Utils.scala:394)
>         at
>
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>         at
> kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>         at
>
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>         at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
>         at
> kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:81)
>         at
>
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
>         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:109)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
>         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>         at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:108)
>         at
>
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96)
>         at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
>
>
> [2013-08-26 12:38:18,854] INFO [ReplicaFetcherThread--1-6], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2013-08-26 12:38:18,854] INFO [ReplicaFetcherThread--1-6], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2013-08-26 12:38:18,858] INFO [ReplicaFetcherThread-0-6], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2013-08-26 12:38:23,821] INFO Closing socket connection to /10.15.11.24.
> (kafka.network.Processor)
> [2013-08-26 12:38:23,899] INFO [ReplicaFetcherThread-0-6], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2013-08-26 12:38:23,899] INFO [ReplicaFetcherThread-0-6], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2013-08-26 12:38:23,899] INFO [ReplicaFetcherThread-2-6], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2013-08-26 12:38:24,064] INFO [ReplicaFetcherThread-2-6], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2013-08-26 12:38:24,064] INFO [ReplicaFetcherThread-2-6], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
>
>
>
> On Tue, Aug 27, 2013 at 10:05 AM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> > When you said you tried to shutdown the broker, did you try controlled
> > shutdown? Do you see "Shutting down" in the logs of the broker that
> seemed
> > to hang?
> >
> > Thanks,
> > Neha
> >
> >
> > On Tue, Aug 27, 2013 at 9:12 AM, Vadim Keylis <vk...@gmail.com>
> > wrote:
> >
> > > No. They actually were stuck. Not responding to shutdown request. I had
> > to
> > > kill them with kill -9 command. I try to take heap dump which hang as
> > well.
> > >
> > > Sent from my iPhone
> > >
> > > On Aug 27, 2013, at 8:14 AM, Jun Rao <ju...@gmail.com> wrote:
> > >
> > > > The errors you listed may not be serious, as long as they are
> > transient.
> > > > When you say 2 of the brokers are not responsive, are they issuing
> > fetch
> > > > requests to the 3rd broker (look at the request log)? During a
> restart
> > of
> > > > the whole cluster, brokers that are started later may not have any
> > leader
> > > > and thus won't take any request from the client. You will need to run
> > the
> > > > leader balance tool.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Mon, Aug 26, 2013 at 10:12 PM, Vadim Keylis <
> vkeylis2009@gmail.com
> > > >wrote:
> > > >
> > > >> Somehow I am getting my instances of kafka to crash. I started kafka
> > > >> instances one by one and they started successfully. Later it some
> how
> > > two
> > > >> of 3 instances became completely unresponsive. The process is
> running,
> > > but
> > > >> connnection over jmx or taking heat dump not possible. The last one
> > some
> > > >> what resposnive.
> > > >> I am not sure how server get to this state. Is there anything I can
> > > monitor
> > > >> to predict instances about to crash. What are ways to recover
> without
> > > data
> > > >> loss? What am I doing wrong to get to this state. Please advise.
> > > >> I poke around error logs on hosts that are not responsive and here
> are
> > > the
> > > >> errors I found. One that I have not listed LeaderNotFoundExceotion.
> > > >>
> > > >> The most puzzling is about zookeeper as it was not redeployed or
> > > updated.
> > > >> [2013-08-26 12:14:35,357] ERROR [KafkaApi-5] Error while fetching
> > > metadata
> > > >> for partition [self_reactivation,0] (kafka.server.KafkaApis)
> > > >> kafka.common.ReplicaNotAvailableException
> > > >>        at
> > > >>
> > >
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:471)
> > > >>        at
> > > >>
> > >
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:456)
> > > >>        at
> > > >>
> > > >>
> > >
> >
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
> > > >>        at
> > > >>
> > > >>
> > >
> >
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
> > > >>        at
> > > >>
> > > >>
> > >
> >
> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
> > > >>        at scala.collection.immutable.List.foreach(List.scala:76)
> > > >>        at
> > > >>
> scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
> > > >>
> > > >>
> > > >> in server.log
> > > >> [2013-08-26 21:00:51,942] ERROR Conditional update of path
> > > >> /brokers/topics/meetme/partitions/12/state with data {
> > > >> "controller_epoch":6, "isr":[ 5 ], "leader":5, "leader_epoch":1,
> > > >> "version":1 } and expected version 2 failed due to
> > > >> org.apache.zookeeper.KeeperException$BadVersionException:
> > > KeeperErrorCode =
> > > >> BadVersion for /brokers/topics/meetme/partitions/12/state
> > > >> (kafka.utils.ZkUtils$)
> > > >> [2013-08-26 21:00:51,943] INFO Partition [meetme,12] on broker 5:
> > Cached
> > > >> zkVersion [2] not equal to that in zookeeper, skip updating ISR
> > > >> (kafka.cluster.Partition)
> > > >> [2013-08-26 21:00:51,990] INFO Partition [meetme,4] on broker 5:
> > > Shrinking
> > > >> ISR for partition [meetme,4] from 5,4 to 5 (kafka.cluster.Partition)
> > > >> [2013-08-26 21:00:51,993] ERROR Conditional update of path
> > > >> /brokers/topics/meetme/partitions/4/state with data {
> > > "controller_epoch":6,
> > > >> "isr":[ 5 ], "leader":5, "leader_epoch":1, "version":1 } and
> expected
> > > >> version 2 failed due to
> > > >> org.apache.zookeeper.KeeperException$BadVersionException:
> > > KeeperErrorCode =
> > > >> BadVersion for /brokers/topics/meetme/partitions/4/state
> > > >> (kafka.utils.ZkUtils$)
> > > >> [2013-08-26 21:00:51,993] INFO Partition [meetme,4] on broker 5:
> > Cached
> > > >> zkVersion [2] not equal to that in zookeeper, skip updating ISR
> > > >> (kafka.cluster.Partition)
> > > >> [2013-08-26 21:00:52,103] INFO Partition [meetme,6] on broker 5:
> > > Shrinking
> > > >> ISR for partition [meetme,6] from 5,4 to 5 (kafka.cluster.Partition)
> > > >> [2013-08-26 21:00:52,107] ERROR Conditional update of path
> > > >> /brokers/topics/meetme/partitions/6/state with data {
> > > "controller_epoch":6,
> > > >> "isr":[ 5 ], "leader":5, "leader_epoch":2, "version":1 } and
> expected
> > > >> version 3 failed due to
> > > >> org.apache.zookeeper.KeeperException$BadVersionException:
> > > KeeperErrorCode =
> > > >> BadVersion for /brokers/topics/meetme/partitions/6/state
> > > >> (kafka.utils.ZkUtils$)
> > > >> [2013-08-26 21:00:52,107] INFO Partition [meetme,6] on broker 5:
> > Cached
> > > >> zkVersion [3] not equal to that in zookeeper, skip updating ISR
> > > >> (kafka.cluster.Partition)
> > > >>
> > >
> >
>

Re: Instances became unresponsive

Posted by Vadim Keylis <vk...@gmail.com>.
We do not use controlled shutdown through JMX, its configured in the
property file. I do not see control shutdown message at the time I
initiated the shutdown. However searching for the string produced the
following error messages which happened hours before I started shutting
down service.

[2013-08-26 12:38:18,850] WARN [ReplicaFetcherThread--1-6], Error in fetch
Name: FetchRequest; Version: 0; CorrelationId: 1541; ClientId:
ReplicaFetcherThread--1-6; ReplicaId: 5; MaxWait: 500 ms; MinBytes: 1
bytes; RequestInfo: [pets_pageview,25] ->
PartitionFetchInfo(0,1048576),[mm_msg,26] ->
PartitionFetchInfo(0,1048576),[pets_cashruns_spin,22] ->
PartitionFetchInfo(0,1048576),[page_timings,5] ->
PartitionFetchInfo(0,1048576),[cafe_purchases,13] ->
PartitionFetchInfo(0,1048576),[meetme_spotlight_action,33] ->
PartitionFetchInfo(0,1048576),[mob_gift,5] ->
PartitionFetchInfo(0,1048576),[cafe_coin_purchases,18] ->
PartitionFetchInfo(0,1048576),[security_trigger,11] ->
PartitionFetchInfo(0,1048576),[pysm_click,30] ->
PartitionFetchInfo(0,1048576),[gold_blacklist,21] ->
PartitionFetchInfo(0,1048576),[meetme_oops,9] ->
PartitionFetchInfo(0,1048576),[m3_session_info,15] ->
PartitionFetchInfo(0,1048576),[link_review,10] ->
PartitionFetchInfo(0,1048576),[cafe_debug,28] ->
PartitionFetchInfo(0,1048576),[m3_login_button,0] ->
PartitionFetchInfo(0,1048576),[pets_level,24] ->
PartitionFetchInfo(0,1048576),[login_detail,9] ->
PartitionFetchInfo(0,1048576),[click_mail,31] ->
PartitionFetchInfo(0,1048576),[pets_wish,15] ->
PartitionFetchInfo(0,1048576),[page_view_admin,6] ->
PartitionFetchInfo(0,1048576),[hi5_image_cleanup,9] ->
PartitionFetchInfo(0,1048576),[pets_wish,24] ->
PartitionFetchInfo(0,1048576),[cafe_food_spoiled,23] ->
PartitionFetchInfo(0,1048576),[pets_wish,33] ->
PartitionFetchInfo(0,1048576),[account_notifications,11] ->
PartitionFetchInfo(0,1048576),[google_transactions,7] ->
PartitionFetchInfo(0,1048576),[hi5_image_cleanup,3] ->
PartitionFetchInfo(0,1048576),[pets_economy_change,15] ->
PartitionFetchInfo(0,1048576),[payment,13] ->
PartitionFetchInfo(0,1048576),[validation,0] ->
PartitionFetchInfo(0,1048576),[meetme_new_contact_count,5] ->
PartitionFetchInfo(0,1048576),[mail_send,18] ->
PartitionFetchInfo(0,1048576),[lightbox_click,28] ->
PartitionFetchInfo(0,1048576),[rso_scanner_append,5] ->
PartitionFetchInfo(0,1048576),[mob_gift,2] ->
PartitionFetchInfo(0,1048576),[cafe_waiter_tips,24] ->
PartitionFetchInfo(0,1048576),[groups_user_actions,22] ->
PartitionFetchInfo(0,1048576),[jstiming,23] ->
PartitionFetchInfo(0,1048576),[viral_contact_inviters,26] ->
PartitionFetchInfo(0,1048576),[meetme_matches_tab,24] ->
PartitionFetchInfo(0,1048576),[promotions_targeted_actions,19] ->
PartitionFetchInfo(0,1048576),[reg_check,15] ->
PartitionFetchInfo(0,1048576),[cafe_waiters_quit,9] ->
PartitionFetchInfo(0,1048576),[groups_search,28] ->
PartitionFetchInfo(0,1048576),[meetme_auto_message_send,15] ->
PartitionFetchInfo(0,1048576),[virtual_gift,21] ->
PartitionFetchInfo(0,1048576),[groupchat_room,21] ->
PartitionFetchInfo(0,1048576),[rapleaf_inter,10] ->
PartitionFetchInfo(0,1048576),[reg_check,18] ->
PartitionFetchInfo(0,1048576),[viral_contact_inviters,29] ->
PartitionFetchInfo(0,1048576),[mail_send,21] ->
PartitionFetchInfo(0,1048576),[pets_economy_change,0] ->
PartitionFetchInfo(0,1048576),[pets_classifier_test,19] ->
PartitionFetchInfo(0,1048576),[cafe_goals,10] ->
PartitionFetchInfo(0,1048576),[pets_browse,9] ->
PartitionFetchInfo(0,1048576),[pets_achievements,24] ->
PartitionFetchInfo(0,1048576),[content_review_preprocess,31] ->
PartitionFetchInfo(0,1048576),[hi5_image_cleanup,12] ->
PartitionFetchInfo(0,1048576),[link_render,16] ->
PartitionFetchInfo(0,1048576),[ent_wt_winners,2] ->
PartitionFetchInfo(0,1048576),[meetme,23] ->
PartitionFetchInfo(1178,1048576),[gwallet_notification,30] ->
PartitionFetchInfo(0,1048576),[m3_login_button,18] ->
PartitionFetchInfo(0,1048576),[mobile_logout,3] ->
PartitionFetchInfo(0,1048576),[trialpay_notification,30] ->
PartitionFetchInfo(0,1048576),[sponsorpay_notification,35] ->
PartitionFetchInfo(0,1048576),[groupchat_room,6] ->
PartitionFetchInfo(0,1048576),[pets_reload,30] ->
PartitionFetchInfo(0,1048576),[report_source,1] ->
PartitionFetchInfo(0,1048576),[pets_nec_buygold,21] ->
PartitionFetchInfo(0,1048576),[set_primary_photo,13] ->
PartitionFetchInfo(0,1048576),[ent_wt_lb_sort,2] ->
PartitionFetchInfo(0,1048576),[video_shares,24] ->
PartitionFetchInfo(0,1048576),[viral_contact_importers,13] ->
PartitionFetchInfo(0,1048576),[click_mail,13] ->
PartitionFetchInfo(0,1048576),[pets_achievements,6] ->
PartitionFetchInfo(0,1048576),[newsfeed_likes,10] ->
PartitionFetchInfo(0,1048576),[pets_achievements,12] ->
PartitionFetchInfo(0,1048576),[registration_email_domain_scan,31] ->
PartitionFetchInfo(0,1048576),[photoviewer_message_click,19] ->
PartitionFetchInfo(0,1048576),[registration_email_domain_scan,1] ->
PartitionFetchInfo(0,1048576),[pets_cashruns_conf,6] ->
PartitionFetchInfo(0,1048576),[meetme_headliners_purchase,28] ->
PartitionFetchInfo(0,1048576),[pets_cashflow,33] ->
PartitionFetchInfo(0,1048576),[top8_feat_users,6] ->
PartitionFetchInfo(0,1048576),[http_headers,29] ->
PartitionFetchInfo(0,1048576),[pets_cashruns_purchase,24] ->
PartitionFetchInfo(0,1048576),[payment,4] ->
PartitionFetchInfo(0,1048576),[talk_recommend_click,13] ->
PartitionFetchInfo(0,1048576),[mob_help,3] ->
PartitionFetchInfo(0,1048576),[auto_cancel,6] ->
PartitionFetchInfo(0,1048576),[bouncemail,32] ->
PartitionFetchInfo(0,1048576),[user_support,23] ->
PartitionFetchInfo(0,1048576),[groups_search,34] ->
PartitionFetchInfo(0,1048576),[pets_cashruns_spin,16] ->
PartitionFetchInfo(0,1048576),[trialpay_notification,21] ->
PartitionFetchInfo(0,1048576),[lightbox_view,6] ->
PartitionFetchInfo(0,1048576),[gwallet_notification,21] ->
PartitionFetchInfo(0,1048576),[cafe_unspoils,20] ->
PartitionFetchInfo(0,1048576),[pets_cashflow,9] ->
PartitionFetchInfo(0,1048576),[promotions_targeted_actions,34] ->
PartitionFetchInfo(0,1048576),[chat_login,4] ->
PartitionFetchInfo(0,1048576),[google_transactions,28] ->
PartitionFetchInfo(0,1048576),[meetme_new_contact_count,11] ->
PartitionFetchInfo(0,1048576),[meetme_spotlight_purchase,28] ->
PartitionFetchInfo(0,1048576),[validation_email_change,24] ->
PartitionFetchInfo(0,1048576),[meetme_oops,30] ->
PartitionFetchInfo(0,1048576),[pets_reset_achievement,18] ->
PartitionFetchInfo(0,1048576),[hi5_blast_photocount,32] ->
PartitionFetchInfo(0,1048576),[security_trigger,5] ->
PartitionFetchInfo(0,1048576),[pets_cashruns_winners,32] ->
PartitionFetchInfo(0,1048576),[validation,24] ->
PartitionFetchInfo(0,1048576),[lightbox_click,31] ->
PartitionFetchInfo(0,1048576),[viral_landing_page,7] ->
PartitionFetchInfo(0,1048576),[mobile_performance,26] ->
PartitionFetchInfo(0,1048576),[viral_reg_pv,28] ->
PartitionFetchInfo(0,1048576),[statictiming,8] ->
PartitionFetchInfo(0,1048576),[hi5_blast_photocount,20] ->
PartitionFetchInfo(0,1048576),[mobile_logout,12] ->
PartitionFetchInfo(0,1048576),[validation_lightbox_action,5] ->
PartitionFetchInfo(0,1048576),[cafe_food_served,10] ->
PartitionFetchInfo(0,1048576),[content_checkboxes,4] ->
PartitionFetchInfo(0,1048576),[meetme_matches_tab,27] ->
PartitionFetchInfo(0,1048576),[pets_cashruns_purchase,27] ->
PartitionFetchInfo(0,1048576),[auto_cancel,27] ->
PartitionFetchInfo(0,1048576),[pets_reset_achievement,9] ->
PartitionFetchInfo(0,1048576),[m3_login_button,9] ->
PartitionFetchInfo(0,1048576),[chat_login,16] ->
PartitionFetchInfo(0,1048576),[rapleaf_inter,13] ->
PartitionFetchInfo(0,1048576),[hi5_image_cleanup,6] ->
PartitionFetchInfo(0,1048576),[link_review,1] ->
PartitionFetchInfo(0,1048576),[itunes_notification,6] ->
PartitionFetchInfo(0,1048576),[m3_session_info,24] ->
PartitionFetchInfo(0,1048576),[account_notifications,23] ->
PartitionFetchInfo(0,1048576),[photoviewer_message_click,16] ->
PartitionFetchInfo(0,1048576),[pets_economy_change,18] ->
PartitionFetchInfo(0,1048576),[cafe_coin_purchases,9] ->
PartitionFetchInfo(0,1048576),[virtual_gift,3] ->
PartitionFetchInfo(0,1048576),[complains,8] ->
PartitionFetchInfo(0,1048576),[pets_purchased,20] ->
PartitionFetchInfo(0,1048576),[cafe_waiter_tips,18] ->
PartitionFetchInfo(0,1048576),[rso_scanner_append,23] ->
PartitionFetchInfo(0,1048576),[viral_contact_importers,4] ->
PartitionFetchInfo(0,1048576),[viral_contact_inviters,17] ->
PartitionFetchInfo(0,1048576),[meetme_matches_tab,21] ->
PartitionFetchInfo(0,1048576),[viral_invite,31] ->
PartitionFetchInfo(0,1048576),[page_view_api,13] ->
PartitionFetchInfo(1143,1048576),[mm_msg,17] ->
PartitionFetchInfo(0,1048576),[cancel,22] ->
PartitionFetchInfo(0,1048576),[mobile_performance,20] ->
PartitionFetchInfo(0,1048576),[set_primary_photo,22] ->
PartitionFetchInfo(0,1048576),[registration_email_domain_scan,10] ->
PartitionFetchInfo(0,1048576),[mob_properties,28] ->
PartitionFetchInfo(0,1048576),[viral_registration,22] ->
PartitionFetchInfo(0,1048576),[user_contact,9] ->
PartitionFetchInfo(0,1048576),[rapleaf_image,2] ->
PartitionFetchInfo(0,1048576),[pets_cashruns_spin,25] ->
PartitionFetchInfo(0,1048576),[cafe_debug,7] ->
PartitionFetchInfo(0,1048576),[mobile_app_activities,8] ->
PartitionFetchInfo(0,1048576),[cafe_food_spoiled,20] ->
PartitionFetchInfo(0,1048576),[validation,9] ->
PartitionFetchInfo(0,1048576),[pets_achievements,15] ->
PartitionFetchInfo(0,1048576),[gwallet_notification,3] ->
PartitionFetchInfo(0,1048576),[index_pageview,33] ->
PartitionFetchInfo(0,1048576),[security_trigger,23] ->
PartitionFetchInfo(0,1048576),[link_render,22] ->
PartitionFetchInfo(0,1048576),[lightbox_click,19] ->
PartitionFetchInfo(0,1048576),[cafe_waiter_tips,30] ->
PartitionFetchInfo(0,1048576),[questions,13] ->
PartitionFetchInfo(0,1048576),[hi5_image_cleanup,15] ->
PartitionFetchInfo(0,1048576),[phoneflow,31] ->
PartitionFetchInfo(0,1048576),[pets_buyback_lightbox,18] ->
PartitionFetchInfo(0,1048576),[mob_premium,6] ->
PartitionFetchInfo(0,1048576),[cafe_unspoils,5] ->
PartitionFetchInfo(0,1048576),[pets_cashruns_spin,34] ->
PartitionFetchInfo(0,1048576),[viral_reg_pv,16] ->
PartitionFetchInfo(0,1048576),[http_headers,23] ->
PartitionFetchInfo(0,1048576),[viral_landing_page,10] ->
PartitionFetchInfo(0,1048576),[mob_properties,13] ->
PartitionFetchInfo(0,1048576),[mobile_api,13] ->
PartitionFetchInfo(0,1048576),[pets_purchased,32] ->
PartitionFetchInfo(0,1048576),[pets_cashruns_bid,22] ->
PartitionFetchInfo(0,1048576),[photo_view,22] ->
PartitionFetchInfo(0,1048576),[viral_import_conf_pv,18] ->
PartitionFetchInfo(0,1048576),[mob_viral_job,28] ->
PartitionFetchInfo(0,1048576),[newsfeed_filter_change,22] ->
PartitionFetchInfo(0,1048576),[password_change,12] ->
PartitionFetchInfo(0,1048576),[ent_wt_winners,23] ->
PartitionFetchInfo(0,1048576),[m3_account_cancel,15] ->
PartitionFetchInfo(0,1048576),[meetme_auto_message_send,6] ->
PartitionFetchInfo(0,1048576),[pets_cashflow,27] ->
PartitionFetchInfo(0,1048576),[rapleaf_inter,31] ->
PartitionFetchInfo(0,1048576),[user_contact,33] ->
PartitionFetchInfo(0,1048576),[validation_email_change,0] ->
PartitionFetchInfo(0,1048576),[pets_classifier_pysb,18] ->
PartitionFetchInfo(0,1048576),[http_headers,14] ->
PartitionFetchInfo(0,1048576),[meetme_matches_tab,30] ->
PartitionFetchInfo(0,1048576),[m3_user_block,23] ->
PartitionFetchInfo(0,1048576),[mob_premium,9] ->
PartitionFetchInfo(0,1048576),[page_view_api,19] ->
PartitionFetchInfo(1125,1048576),[photoviewer_like,15] ->
PartitionFetchInfo(0,1048576),[meetme_oops,27] ->
PartitionFetchInfo(0,1048576),[photoviewer_message_click,1] ->
PartitionFetchInfo(0,1048576),[page_view_admin,27] ->
PartitionFetchInfo(0,1048576),[cafe_waiters_quit,6] ->
PartitionFetchInfo(0,1048576),[mob_premium,15] ->
PartitionFetchInfo(0,1048576),[itunes_notification,15] ->
PartitionFetchInfo(0,1048576),[phoneflow,22] ->
PartitionFetchInfo(0,1048576),[auto_cancel,30] ->
PartitionFetchInfo(0,1048576),[skins_library,33] ->
PartitionFetchInfo(0,1048576),[hi5_blast_photocount,2] ->
PartitionFetchInfo(0,1048576),[security_trigger,8] ->
PartitionFetchInfo(0,1048576),[test_assign,0] ->
PartitionFetchInfo(0,1048576),[test_assign,9] ->
PartitionFetchInfo(0,1048576),[cafe_badge,3] ->
PartitionFetchInfo(0,1048576),[mob_properties,31] ->
PartitionFetchInfo(0,1048576),[pets_classifier_bp,7] ->
PartitionFetchInfo(0,1048576),[chat_login,13] ->
PartitionFetchInfo(0,1048576),[cafe_waiters_hire,18] ->
PartitionFetchInfo(0,1048576),[click_mail,7] ->
PartitionFetchInfo(0,1048576),[mob_gift,14] ->
PartitionFetchInfo(0,1048576),[password_change,3] ->
PartitionFetchInfo(0,1048576),[mm_msg,8] ->
PartitionFetchInfo(0,1048576),[tagline,1] ->
PartitionFetchInfo(0,1048576),[meetme_oops,21] ->
PartitionFetchInfo(0,1048576),[pets_cashflow,18] ->
PartitionFetchInfo(0,1048576),[newsfeed_likes,22] ->
PartitionFetchInfo(0,1048576),[video_shares,33] ->
PartitionFetchInfo(0,1048576),[pets_classifier_test,28] ->
PartitionFetchInfo(0,1048576),[itunes_notification,33] ->
PartitionFetchInfo(0,1048576),[trialpay_notification,24] ->
PartitionFetchInfo(0,1048576),[viral_import,28] ->
PartitionFetchInfo(0,1048576),[chat_guest,4] ->
PartitionFetchInfo(0,1048576),[registration_mobile,35] ->
PartitionFetchInfo(0,1048576),[sponsorpay_notification,20] ->
PartitionFetchInfo(0,1048576),[tagline,16] ->
PartitionFetchInfo(0,1048576),[auto_cancel,15] ->
PartitionFetchInfo(0,1048576),[pets_level,0] ->
PartitionFetchInfo(0,1048576),[security_trigger,20] ->
PartitionFetchInfo(0,1048576),[pets_cashruns_bid,7] ->
PartitionFetchInfo(0,1048576),[statictiming,29] ->
PartitionFetchInfo(0,1048576),[meetme_auto_message_send,21] ->
PartitionFetchInfo(0,1048576),[pets_cashruns_pageview,29] ->
PartitionFetchInfo(0,1048576),[newsfeed_likes,13] ->
PartitionFetchInfo(0,1048576),[page_view_admin,24] ->
PartitionFetchInfo(0,1048576),[cafe_waiters_hire,30] ->
PartitionFetchInfo(0,1048576),[matches_bar_view,35] ->
PartitionFetchInfo(0,1048576),[pets_pageview,31] ->
PartitionFetchInfo(0,1048576),[external_url,15] ->
PartitionFetchInfo(0,1048576),[newsfeed_posts,34] ->
PartitionFetchInfo(0,1048576),[viral_import,13] ->
PartitionFetchInfo(0,1048576),[pets_classifier_bp,22] ->
PartitionFetchInfo(0,1048576),[virtual_gift,6] ->
PartitionFetchInfo(0,1048576),[rapleaf_image,17] ->
PartitionFetchInfo(0,1048576),[cafe_goals,28] ->
PartitionFetchInfo(0,1048576),[language_reg,35] ->
PartitionFetchInfo(0,1048576),[pets_nec_view,12] ->
PartitionFetchInfo(0,1048576),[pets_economy_change,3] ->
PartitionFetchInfo(0,1048576),[pets_purchased,26] ->
PartitionFetchInfo(0,1048576),[mob_properties,10] ->
PartitionFetchInfo(0,1048576),[mob_gift,26] ->
PartitionFetchInfo(0,1048576),[registration_email_domain_scan,13] ->
PartitionFetchInfo(0,1048576),[viral_import_pv,22] ->
PartitionFetchInfo(0,1048576),[photo_view,28] ->
PartitionFetchInfo(0,1048576),[videos,10] ->
PartitionFetchInfo(0,1048576),[cafe_coin_purchases,15] ->
PartitionFetchInfo(0,1048576),[phoneflow,19] ->
PartitionFetchInfo(0,1048576),[cafe_debug,19] ->
PartitionFetchInfo(0,1048576),[pets_level,3] ->
PartitionFetchInfo(0,1048576),[mob_gift,23] ->
PartitionFetchInfo(0,1048576),[pets_reset_achievement,27] ->
PartitionFetchInfo(0,1048576),[m3_profile_views,2] ->
PartitionFetchInfo(0,1048576),[bouncemail,5] ->
PartitionFetchInfo(0,1048576),[mob_properties,4] ->
PartitionFetchInfo(0,1048576),[content_review_preprocess,28] ->
PartitionFetchInfo(0,1048576),[pysm_click,3] ->
PartitionFetchInfo(0,1048576),[m3_auth,23] ->
PartitionFetchInfo(0,1048576),[friend_request,2] ->
PartitionFetchInfo(0,1048576),[photoviewer_message_click,4] ->
PartitionFetchInfo(0,1048576),[push_notifications,19] ->
PartitionFetchInfo(0,1048576),[mob_spend_ap,8] ->
PartitionFetchInfo(0,1048576),[purchase_actions,3] ->
PartitionFetchInfo(0,1048576),[promotions_targeted_actions,13] ->
PartitionFetchInfo(0,1048576),[sponsorpay_notification,23] ->
PartitionFetchInfo(0,1048576),[cafe_unspoils,8] ->
PartitionFetchInfo(0,1048576),[meetme_spotlight_action,3] ->
PartitionFetchInfo(0,1048576),[viral_registration,7] ->
PartitionFetchInfo(0,1048576),[complains,20] ->
PartitionFetchInfo(0,1048576),[hi5_image_cleanup,18] ->
PartitionFetchInfo(0,1048576),[sponsorpay_notification,5] ->
PartitionFetchInfo(0,1048576),[pets_nec_view,0] ->
PartitionFetchInfo(0,1048576),[chat_guest,16] ->
PartitionFetchInfo(0,1048576),[promo_toast_view,1] ->
PartitionFetchInfo(0,1048576),[cancel,25] ->
PartitionFetchInfo(0,1048576),[jstiming,8] ->
PartitionFetchInfo(0,1048576),[pets_wish,12] ->
PartitionFetchInfo(0,1048576),[rapleaf_image,5] ->
PartitionFetchInfo(0,1048576),[language_reg,29] ->
PartitionFetchInfo(0,1048576),[long_query,28] ->
PartitionFetchInfo(0,1048576),[pets_cashflow,6] ->
PartitionFetchInfo(0,1048576),[mobile_click,28] ->
PartitionFetchInfo(0,1048576),[mob_help,18] ->
PartitionFetchInfo(0,1048576),[talk_recommend_click,22] ->
PartitionFetchInfo(0,1048576),[set_primary_photo,16] ->
PartitionFetchInfo(0,1048576),[pets_browse,3] -> Part
java.net.SocketTimeoutException
        at
sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:201)
        at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86)
        at
java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:221)
        at kafka.utils.Utils$.read(Utils.scala:394)
        at
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
        at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
        at
kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
        at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
        at
kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:81)
        at
kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
        at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:109)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
        at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
        at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:108)
        at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96)
        at
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)


[2013-08-26 12:38:18,854] INFO [ReplicaFetcherThread--1-6], Stopped
 (kafka.server.ReplicaFetcherThread)
[2013-08-26 12:38:18,854] INFO [ReplicaFetcherThread--1-6], Shutdown
completed (kafka.server.ReplicaFetcherThread)
[2013-08-26 12:38:18,858] INFO [ReplicaFetcherThread-0-6], Shutting down
(kafka.server.ReplicaFetcherThread)
[2013-08-26 12:38:23,821] INFO Closing socket connection to /10.15.11.24.
(kafka.network.Processor)
[2013-08-26 12:38:23,899] INFO [ReplicaFetcherThread-0-6], Stopped
 (kafka.server.ReplicaFetcherThread)
[2013-08-26 12:38:23,899] INFO [ReplicaFetcherThread-0-6], Shutdown
completed (kafka.server.ReplicaFetcherThread)
[2013-08-26 12:38:23,899] INFO [ReplicaFetcherThread-2-6], Shutting down
(kafka.server.ReplicaFetcherThread)
[2013-08-26 12:38:24,064] INFO [ReplicaFetcherThread-2-6], Stopped
 (kafka.server.ReplicaFetcherThread)
[2013-08-26 12:38:24,064] INFO [ReplicaFetcherThread-2-6], Shutdown
completed (kafka.server.ReplicaFetcherThread)



On Tue, Aug 27, 2013 at 10:05 AM, Neha Narkhede <ne...@gmail.com>wrote:

> When you said you tried to shutdown the broker, did you try controlled
> shutdown? Do you see "Shutting down" in the logs of the broker that seemed
> to hang?
>
> Thanks,
> Neha
>
>
> On Tue, Aug 27, 2013 at 9:12 AM, Vadim Keylis <vk...@gmail.com>
> wrote:
>
> > No. They actually were stuck. Not responding to shutdown request. I had
> to
> > kill them with kill -9 command. I try to take heap dump which hang as
> well.
> >
> > Sent from my iPhone
> >
> > On Aug 27, 2013, at 8:14 AM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > The errors you listed may not be serious, as long as they are
> transient.
> > > When you say 2 of the brokers are not responsive, are they issuing
> fetch
> > > requests to the 3rd broker (look at the request log)? During a restart
> of
> > > the whole cluster, brokers that are started later may not have any
> leader
> > > and thus won't take any request from the client. You will need to run
> the
> > > leader balance tool.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Mon, Aug 26, 2013 at 10:12 PM, Vadim Keylis <vkeylis2009@gmail.com
> > >wrote:
> > >
> > >> Somehow I am getting my instances of kafka to crash. I started kafka
> > >> instances one by one and they started successfully. Later it some how
> > two
> > >> of 3 instances became completely unresponsive. The process is running,
> > but
> > >> connnection over jmx or taking heat dump not possible. The last one
> some
> > >> what resposnive.
> > >> I am not sure how server get to this state. Is there anything I can
> > monitor
> > >> to predict instances about to crash. What are ways to recover without
> > data
> > >> loss? What am I doing wrong to get to this state. Please advise.
> > >> I poke around error logs on hosts that are not responsive and here are
> > the
> > >> errors I found. One that I have not listed LeaderNotFoundExceotion.
> > >>
> > >> The most puzzling is about zookeeper as it was not redeployed or
> > updated.
> > >> [2013-08-26 12:14:35,357] ERROR [KafkaApi-5] Error while fetching
> > metadata
> > >> for partition [self_reactivation,0] (kafka.server.KafkaApis)
> > >> kafka.common.ReplicaNotAvailableException
> > >>        at
> > >>
> > kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:471)
> > >>        at
> > >>
> > kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:456)
> > >>        at
> > >>
> > >>
> >
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
> > >>        at
> > >>
> > >>
> >
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
> > >>        at
> > >>
> > >>
> >
> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
> > >>        at scala.collection.immutable.List.foreach(List.scala:76)
> > >>        at
> > >> scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
> > >>
> > >>
> > >> in server.log
> > >> [2013-08-26 21:00:51,942] ERROR Conditional update of path
> > >> /brokers/topics/meetme/partitions/12/state with data {
> > >> "controller_epoch":6, "isr":[ 5 ], "leader":5, "leader_epoch":1,
> > >> "version":1 } and expected version 2 failed due to
> > >> org.apache.zookeeper.KeeperException$BadVersionException:
> > KeeperErrorCode =
> > >> BadVersion for /brokers/topics/meetme/partitions/12/state
> > >> (kafka.utils.ZkUtils$)
> > >> [2013-08-26 21:00:51,943] INFO Partition [meetme,12] on broker 5:
> Cached
> > >> zkVersion [2] not equal to that in zookeeper, skip updating ISR
> > >> (kafka.cluster.Partition)
> > >> [2013-08-26 21:00:51,990] INFO Partition [meetme,4] on broker 5:
> > Shrinking
> > >> ISR for partition [meetme,4] from 5,4 to 5 (kafka.cluster.Partition)
> > >> [2013-08-26 21:00:51,993] ERROR Conditional update of path
> > >> /brokers/topics/meetme/partitions/4/state with data {
> > "controller_epoch":6,
> > >> "isr":[ 5 ], "leader":5, "leader_epoch":1, "version":1 } and expected
> > >> version 2 failed due to
> > >> org.apache.zookeeper.KeeperException$BadVersionException:
> > KeeperErrorCode =
> > >> BadVersion for /brokers/topics/meetme/partitions/4/state
> > >> (kafka.utils.ZkUtils$)
> > >> [2013-08-26 21:00:51,993] INFO Partition [meetme,4] on broker 5:
> Cached
> > >> zkVersion [2] not equal to that in zookeeper, skip updating ISR
> > >> (kafka.cluster.Partition)
> > >> [2013-08-26 21:00:52,103] INFO Partition [meetme,6] on broker 5:
> > Shrinking
> > >> ISR for partition [meetme,6] from 5,4 to 5 (kafka.cluster.Partition)
> > >> [2013-08-26 21:00:52,107] ERROR Conditional update of path
> > >> /brokers/topics/meetme/partitions/6/state with data {
> > "controller_epoch":6,
> > >> "isr":[ 5 ], "leader":5, "leader_epoch":2, "version":1 } and expected
> > >> version 3 failed due to
> > >> org.apache.zookeeper.KeeperException$BadVersionException:
> > KeeperErrorCode =
> > >> BadVersion for /brokers/topics/meetme/partitions/6/state
> > >> (kafka.utils.ZkUtils$)
> > >> [2013-08-26 21:00:52,107] INFO Partition [meetme,6] on broker 5:
> Cached
> > >> zkVersion [3] not equal to that in zookeeper, skip updating ISR
> > >> (kafka.cluster.Partition)
> > >>
> >
>

Re: Instances became unresponsive

Posted by Neha Narkhede <ne...@gmail.com>.
When you said you tried to shutdown the broker, did you try controlled
shutdown? Do you see "Shutting down" in the logs of the broker that seemed
to hang?

Thanks,
Neha


On Tue, Aug 27, 2013 at 9:12 AM, Vadim Keylis <vk...@gmail.com> wrote:

> No. They actually were stuck. Not responding to shutdown request. I had to
> kill them with kill -9 command. I try to take heap dump which hang as well.
>
> Sent from my iPhone
>
> On Aug 27, 2013, at 8:14 AM, Jun Rao <ju...@gmail.com> wrote:
>
> > The errors you listed may not be serious, as long as they are transient.
> > When you say 2 of the brokers are not responsive, are they issuing fetch
> > requests to the 3rd broker (look at the request log)? During a restart of
> > the whole cluster, brokers that are started later may not have any leader
> > and thus won't take any request from the client. You will need to run the
> > leader balance tool.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Aug 26, 2013 at 10:12 PM, Vadim Keylis <vkeylis2009@gmail.com
> >wrote:
> >
> >> Somehow I am getting my instances of kafka to crash. I started kafka
> >> instances one by one and they started successfully. Later it some how
> two
> >> of 3 instances became completely unresponsive. The process is running,
> but
> >> connnection over jmx or taking heat dump not possible. The last one some
> >> what resposnive.
> >> I am not sure how server get to this state. Is there anything I can
> monitor
> >> to predict instances about to crash. What are ways to recover without
> data
> >> loss? What am I doing wrong to get to this state. Please advise.
> >> I poke around error logs on hosts that are not responsive and here are
> the
> >> errors I found. One that I have not listed LeaderNotFoundExceotion.
> >>
> >> The most puzzling is about zookeeper as it was not redeployed or
> updated.
> >> [2013-08-26 12:14:35,357] ERROR [KafkaApi-5] Error while fetching
> metadata
> >> for partition [self_reactivation,0] (kafka.server.KafkaApis)
> >> kafka.common.ReplicaNotAvailableException
> >>        at
> >>
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:471)
> >>        at
> >>
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:456)
> >>        at
> >>
> >>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
> >>        at
> >>
> >>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
> >>        at
> >>
> >>
> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
> >>        at scala.collection.immutable.List.foreach(List.scala:76)
> >>        at
> >> scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
> >>
> >>
> >> in server.log
> >> [2013-08-26 21:00:51,942] ERROR Conditional update of path
> >> /brokers/topics/meetme/partitions/12/state with data {
> >> "controller_epoch":6, "isr":[ 5 ], "leader":5, "leader_epoch":1,
> >> "version":1 } and expected version 2 failed due to
> >> org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode =
> >> BadVersion for /brokers/topics/meetme/partitions/12/state
> >> (kafka.utils.ZkUtils$)
> >> [2013-08-26 21:00:51,943] INFO Partition [meetme,12] on broker 5: Cached
> >> zkVersion [2] not equal to that in zookeeper, skip updating ISR
> >> (kafka.cluster.Partition)
> >> [2013-08-26 21:00:51,990] INFO Partition [meetme,4] on broker 5:
> Shrinking
> >> ISR for partition [meetme,4] from 5,4 to 5 (kafka.cluster.Partition)
> >> [2013-08-26 21:00:51,993] ERROR Conditional update of path
> >> /brokers/topics/meetme/partitions/4/state with data {
> "controller_epoch":6,
> >> "isr":[ 5 ], "leader":5, "leader_epoch":1, "version":1 } and expected
> >> version 2 failed due to
> >> org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode =
> >> BadVersion for /brokers/topics/meetme/partitions/4/state
> >> (kafka.utils.ZkUtils$)
> >> [2013-08-26 21:00:51,993] INFO Partition [meetme,4] on broker 5: Cached
> >> zkVersion [2] not equal to that in zookeeper, skip updating ISR
> >> (kafka.cluster.Partition)
> >> [2013-08-26 21:00:52,103] INFO Partition [meetme,6] on broker 5:
> Shrinking
> >> ISR for partition [meetme,6] from 5,4 to 5 (kafka.cluster.Partition)
> >> [2013-08-26 21:00:52,107] ERROR Conditional update of path
> >> /brokers/topics/meetme/partitions/6/state with data {
> "controller_epoch":6,
> >> "isr":[ 5 ], "leader":5, "leader_epoch":2, "version":1 } and expected
> >> version 3 failed due to
> >> org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode =
> >> BadVersion for /brokers/topics/meetme/partitions/6/state
> >> (kafka.utils.ZkUtils$)
> >> [2013-08-26 21:00:52,107] INFO Partition [meetme,6] on broker 5: Cached
> >> zkVersion [3] not equal to that in zookeeper, skip updating ISR
> >> (kafka.cluster.Partition)
> >>
>

Re: Instances became unresponsive

Posted by Vadim Keylis <vk...@gmail.com>.
No. They actually were stuck. Not responding to shutdown request. I had to kill them with kill -9 command. I try to take heap dump which hang as well.

Sent from my iPhone

On Aug 27, 2013, at 8:14 AM, Jun Rao <ju...@gmail.com> wrote:

> The errors you listed may not be serious, as long as they are transient.
> When you say 2 of the brokers are not responsive, are they issuing fetch
> requests to the 3rd broker (look at the request log)? During a restart of
> the whole cluster, brokers that are started later may not have any leader
> and thus won't take any request from the client. You will need to run the
> leader balance tool.
> 
> Thanks,
> 
> Jun
> 
> 
> On Mon, Aug 26, 2013 at 10:12 PM, Vadim Keylis <vk...@gmail.com>wrote:
> 
>> Somehow I am getting my instances of kafka to crash. I started kafka
>> instances one by one and they started successfully. Later it some how two
>> of 3 instances became completely unresponsive. The process is running, but
>> connnection over jmx or taking heat dump not possible. The last one some
>> what resposnive.
>> I am not sure how server get to this state. Is there anything I can monitor
>> to predict instances about to crash. What are ways to recover without data
>> loss? What am I doing wrong to get to this state. Please advise.
>> I poke around error logs on hosts that are not responsive and here are the
>> errors I found. One that I have not listed LeaderNotFoundExceotion.
>> 
>> The most puzzling is about zookeeper as it was not redeployed or updated.
>> [2013-08-26 12:14:35,357] ERROR [KafkaApi-5] Error while fetching metadata
>> for partition [self_reactivation,0] (kafka.server.KafkaApis)
>> kafka.common.ReplicaNotAvailableException
>>        at
>> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:471)
>>        at
>> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:456)
>>        at
>> 
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
>>        at
>> 
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
>>        at
>> 
>> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
>>        at scala.collection.immutable.List.foreach(List.scala:76)
>>        at
>> scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>> 
>> 
>> in server.log
>> [2013-08-26 21:00:51,942] ERROR Conditional update of path
>> /brokers/topics/meetme/partitions/12/state with data {
>> "controller_epoch":6, "isr":[ 5 ], "leader":5, "leader_epoch":1,
>> "version":1 } and expected version 2 failed due to
>> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
>> BadVersion for /brokers/topics/meetme/partitions/12/state
>> (kafka.utils.ZkUtils$)
>> [2013-08-26 21:00:51,943] INFO Partition [meetme,12] on broker 5: Cached
>> zkVersion [2] not equal to that in zookeeper, skip updating ISR
>> (kafka.cluster.Partition)
>> [2013-08-26 21:00:51,990] INFO Partition [meetme,4] on broker 5: Shrinking
>> ISR for partition [meetme,4] from 5,4 to 5 (kafka.cluster.Partition)
>> [2013-08-26 21:00:51,993] ERROR Conditional update of path
>> /brokers/topics/meetme/partitions/4/state with data { "controller_epoch":6,
>> "isr":[ 5 ], "leader":5, "leader_epoch":1, "version":1 } and expected
>> version 2 failed due to
>> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
>> BadVersion for /brokers/topics/meetme/partitions/4/state
>> (kafka.utils.ZkUtils$)
>> [2013-08-26 21:00:51,993] INFO Partition [meetme,4] on broker 5: Cached
>> zkVersion [2] not equal to that in zookeeper, skip updating ISR
>> (kafka.cluster.Partition)
>> [2013-08-26 21:00:52,103] INFO Partition [meetme,6] on broker 5: Shrinking
>> ISR for partition [meetme,6] from 5,4 to 5 (kafka.cluster.Partition)
>> [2013-08-26 21:00:52,107] ERROR Conditional update of path
>> /brokers/topics/meetme/partitions/6/state with data { "controller_epoch":6,
>> "isr":[ 5 ], "leader":5, "leader_epoch":2, "version":1 } and expected
>> version 3 failed due to
>> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
>> BadVersion for /brokers/topics/meetme/partitions/6/state
>> (kafka.utils.ZkUtils$)
>> [2013-08-26 21:00:52,107] INFO Partition [meetme,6] on broker 5: Cached
>> zkVersion [3] not equal to that in zookeeper, skip updating ISR
>> (kafka.cluster.Partition)
>> 

Re: Instances became unresponsive

Posted by Jun Rao <ju...@gmail.com>.
The errors you listed may not be serious, as long as they are transient.
When you say 2 of the brokers are not responsive, are they issuing fetch
requests to the 3rd broker (look at the request log)? During a restart of
the whole cluster, brokers that are started later may not have any leader
and thus won't take any request from the client. You will need to run the
leader balance tool.

Thanks,

Jun


On Mon, Aug 26, 2013 at 10:12 PM, Vadim Keylis <vk...@gmail.com>wrote:

> Somehow I am getting my instances of kafka to crash. I started kafka
> instances one by one and they started successfully. Later it some how two
> of 3 instances became completely unresponsive. The process is running, but
> connnection over jmx or taking heat dump not possible. The last one some
> what resposnive.
> I am not sure how server get to this state. Is there anything I can monitor
> to predict instances about to crash. What are ways to recover without data
> loss? What am I doing wrong to get to this state. Please advise.
> I poke around error logs on hosts that are not responsive and here are the
> errors I found. One that I have not listed LeaderNotFoundExceotion.
>
>  The most puzzling is about zookeeper as it was not redeployed or updated.
> [2013-08-26 12:14:35,357] ERROR [KafkaApi-5] Error while fetching metadata
> for partition [self_reactivation,0] (kafka.server.KafkaApis)
> kafka.common.ReplicaNotAvailableException
>         at
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:471)
>         at
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:456)
>         at
>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
>         at
>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
>         at
>
> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
>         at scala.collection.immutable.List.foreach(List.scala:76)
>         at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>
>
> in server.log
> [2013-08-26 21:00:51,942] ERROR Conditional update of path
> /brokers/topics/meetme/partitions/12/state with data {
> "controller_epoch":6, "isr":[ 5 ], "leader":5, "leader_epoch":1,
> "version":1 } and expected version 2 failed due to
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> BadVersion for /brokers/topics/meetme/partitions/12/state
> (kafka.utils.ZkUtils$)
> [2013-08-26 21:00:51,943] INFO Partition [meetme,12] on broker 5: Cached
> zkVersion [2] not equal to that in zookeeper, skip updating ISR
> (kafka.cluster.Partition)
> [2013-08-26 21:00:51,990] INFO Partition [meetme,4] on broker 5: Shrinking
> ISR for partition [meetme,4] from 5,4 to 5 (kafka.cluster.Partition)
> [2013-08-26 21:00:51,993] ERROR Conditional update of path
> /brokers/topics/meetme/partitions/4/state with data { "controller_epoch":6,
> "isr":[ 5 ], "leader":5, "leader_epoch":1, "version":1 } and expected
> version 2 failed due to
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> BadVersion for /brokers/topics/meetme/partitions/4/state
> (kafka.utils.ZkUtils$)
> [2013-08-26 21:00:51,993] INFO Partition [meetme,4] on broker 5: Cached
> zkVersion [2] not equal to that in zookeeper, skip updating ISR
> (kafka.cluster.Partition)
> [2013-08-26 21:00:52,103] INFO Partition [meetme,6] on broker 5: Shrinking
> ISR for partition [meetme,6] from 5,4 to 5 (kafka.cluster.Partition)
> [2013-08-26 21:00:52,107] ERROR Conditional update of path
> /brokers/topics/meetme/partitions/6/state with data { "controller_epoch":6,
> "isr":[ 5 ], "leader":5, "leader_epoch":2, "version":1 } and expected
> version 3 failed due to
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> BadVersion for /brokers/topics/meetme/partitions/6/state
> (kafka.utils.ZkUtils$)
> [2013-08-26 21:00:52,107] INFO Partition [meetme,6] on broker 5: Cached
> zkVersion [3] not equal to that in zookeeper, skip updating ISR
> (kafka.cluster.Partition)
>