You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Akim Akimov (JIRA)" <ji...@apache.org> on 2018/05/31 13:54:00 UTC
[jira] [Updated] (SAMZA-1739) Some containers hangs on restore KV
store
[ https://issues.apache.org/jira/browse/SAMZA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Akim Akimov updated SAMZA-1739:
-------------------------------
Description:
There's a problem we could not reproduce in dev enviroment which affected prod enviroment.
Issue is that on restart of application 4 containers out of 12 hanging on restore from kv store changelog.
Application configuration:
12 containers deployed with yarn. kv store in question - window aggregation KV
this is how it manifests:
{{2018-05-29 16:01:55,650 [main] INFO org.apache.samza.storage.TaskStorageManager - Assigning oldest change log offsets for taskName Partition 8: Map(SystemStream [system=kafka, stream=chainstream_one-1-window-window_cid_batch] -> 0)}}
{{2018-05-29 16:01:55,653 [main] INFO org.apache.samza.storage.TaskStorageManager - Registering change log consumer with offset 0 for SystemStreamPartition [kafka, chainstream_one-1-window-window_cid_batch, 10].}}
{{2018-05-29 16:01:55,654 [main] INFO org.apache.samza.system.kafka.KafkaSystemConsumer - Refreshing brokers for: Map([chainstream_one-1-window-window_cid_batch,10] -> 0)}}
{{2018-05-29 16:01:55,655 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Creating new SimpleConsumer for host ip-x.us-east-1.code418.net:9092 for system kafka}}
{{2018-05-29 16:01:55,656 [main] INFO org.apache.samza.system.kafka.GetOffset - Validating offset 0 for topic and partition [chainstream_one-1-window-window_cid_batch,10]}}
{{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.GetOffset - Able to successfully read from offset 0 for topic and partition [chainstream_one-1-window-window_cid_batch,10]. Using it to instantiate consumer.}}
{{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Starting BrokerProxy for ip-x.us-east-1.code418.net:9092}}
{{2018-05-29 16:01:58,129 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 1000000 entries restored...}}
{{2018-05-29 16:01:59,707 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 2000000 entries restored...}}
{{2018-05-29 16:02:01,318 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 3000000 entries restored...}}
{{2018-05-29 16:02:02,920 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 4000000 entries restored...}}{{End of LogType:stdout. This log file belongs to a running container (}}
Other containers starts as normal:
{{2018-05-29 16:02:18,564 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 13000000 entries restored...}}
{{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Shutting down BrokerProxy for ip-x.net:9092}}
{{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy - closing simple consumer...}}
System:
Samza 0.14
kafka.x86_64 0.11.0.1-1
|YARN|2.7.3|
|ZooKeeper|3.4.6|
was:
There's a problem we could not reproduce in dev enviroment which affected prod enviroment.
Issue is that on restart of application 4 containers out of 12 hanging on restore from kv store changelog.
this is how it manifests:
{{2018-05-29 16:01:55,650 [main] INFO org.apache.samza.storage.TaskStorageManager - Assigning oldest change log offsets for taskName Partition 8: Map(SystemStream [system=kafka, stream=chainstream_one-1-window-window_cid_batch] -> 0)}}
{{2018-05-29 16:01:55,653 [main] INFO org.apache.samza.storage.TaskStorageManager - Registering change log consumer with offset 0 for SystemStreamPartition [kafka, chainstream_one-1-window-window_cid_batch, 10].}}
{{2018-05-29 16:01:55,654 [main] INFO org.apache.samza.system.kafka.KafkaSystemConsumer - Refreshing brokers for: Map([chainstream_one-1-window-window_cid_batch,10] -> 0)}}
{{2018-05-29 16:01:55,655 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Creating new SimpleConsumer for host ip-x.us-east-1.code418.net:9092 for system kafka}}
{{2018-05-29 16:01:55,656 [main] INFO org.apache.samza.system.kafka.GetOffset - Validating offset 0 for topic and partition [chainstream_one-1-window-window_cid_batch,10]}}
{{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.GetOffset - Able to successfully read from offset 0 for topic and partition [chainstream_one-1-window-window_cid_batch,10]. Using it to instantiate consumer.}}
{{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Starting BrokerProxy for ip-x.us-east-1.code418.net:9092}}
{{2018-05-29 16:01:58,129 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 1000000 entries restored...}}
{{2018-05-29 16:01:59,707 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 2000000 entries restored...}}
{{2018-05-29 16:02:01,318 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 3000000 entries restored...}}
{{2018-05-29 16:02:02,920 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 4000000 entries restored...}}{{End of LogType:stdout. This log file belongs to a running container (}}
Other containers starts as normal:
{{2018-05-29 16:02:18,564 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 13000000 entries restored...}}
{{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Shutting down BrokerProxy for ip-x.net:9092}}
{{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy - closing simple consumer...}}
System:
Samza 0.14
kafka.x86_64 0.11.0.1-1
|YARN|2.7.3|
|ZooKeeper|3.4.6|
> Some containers hangs on restore KV store
> -----------------------------------------
>
> Key: SAMZA-1739
> URL: https://issues.apache.org/jira/browse/SAMZA-1739
> Project: Samza
> Issue Type: Bug
> Components: kafka, kv-store
> Affects Versions: 0.14.0
> Reporter: Akim Akimov
> Priority: Major
>
> There's a problem we could not reproduce in dev enviroment which affected prod enviroment.
>
> Issue is that on restart of application 4 containers out of 12 hanging on restore from kv store changelog.
>
>
> Application configuration:
> 12 containers deployed with yarn. kv store in question - window aggregation KV
> this is how it manifests:
>
> {{2018-05-29 16:01:55,650 [main] INFO org.apache.samza.storage.TaskStorageManager - Assigning oldest change log offsets for taskName Partition 8: Map(SystemStream [system=kafka, stream=chainstream_one-1-window-window_cid_batch] -> 0)}}
> {{2018-05-29 16:01:55,653 [main] INFO org.apache.samza.storage.TaskStorageManager - Registering change log consumer with offset 0 for SystemStreamPartition [kafka, chainstream_one-1-window-window_cid_batch, 10].}}
> {{2018-05-29 16:01:55,654 [main] INFO org.apache.samza.system.kafka.KafkaSystemConsumer - Refreshing brokers for: Map([chainstream_one-1-window-window_cid_batch,10] -> 0)}}
> {{2018-05-29 16:01:55,655 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Creating new SimpleConsumer for host ip-x.us-east-1.code418.net:9092 for system kafka}}
> {{2018-05-29 16:01:55,656 [main] INFO org.apache.samza.system.kafka.GetOffset - Validating offset 0 for topic and partition [chainstream_one-1-window-window_cid_batch,10]}}
> {{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.GetOffset - Able to successfully read from offset 0 for topic and partition [chainstream_one-1-window-window_cid_batch,10]. Using it to instantiate consumer.}}
> {{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Starting BrokerProxy for ip-x.us-east-1.code418.net:9092}}
> {{2018-05-29 16:01:58,129 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 1000000 entries restored...}}
> {{2018-05-29 16:01:59,707 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 2000000 entries restored...}}
> {{2018-05-29 16:02:01,318 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 3000000 entries restored...}}
> {{2018-05-29 16:02:02,920 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 4000000 entries restored...}}{{End of LogType:stdout. This log file belongs to a running container (}}
> Other containers starts as normal:
>
> {{2018-05-29 16:02:18,564 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 13000000 entries restored...}}
> {{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Shutting down BrokerProxy for ip-x.net:9092}}
> {{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy - closing simple consumer...}}
>
> System:
> Samza 0.14
> kafka.x86_64 0.11.0.1-1
> |YARN|2.7.3|
> |ZooKeeper|3.4.6|
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)