You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Akim Akimov (JIRA)" <ji...@apache.org> on 2018/05/31 13:54:00 UTC

[jira] [Updated] (SAMZA-1739) Some containers hangs on restore KV store

     [ https://issues.apache.org/jira/browse/SAMZA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Akim Akimov updated SAMZA-1739:
-------------------------------
    Description: 
There's a problem we could not reproduce in dev enviroment which affected prod enviroment.

 

Issue is that on restart of application 4 containers out of 12 hanging on restore from kv store changelog.

 

 

Application configuration:

12 containers deployed with yarn. kv store in question - window aggregation KV 

this is how it manifests:

 

{{2018-05-29 16:01:55,650 [main] INFO org.apache.samza.storage.TaskStorageManager - Assigning oldest change log offsets for taskName Partition 8: Map(SystemStream [system=kafka, stream=chainstream_one-1-window-window_cid_batch] -> 0)}}
 {{2018-05-29 16:01:55,653 [main] INFO org.apache.samza.storage.TaskStorageManager - Registering change log consumer with offset 0 for SystemStreamPartition [kafka, chainstream_one-1-window-window_cid_batch, 10].}}
 {{2018-05-29 16:01:55,654 [main] INFO org.apache.samza.system.kafka.KafkaSystemConsumer - Refreshing brokers for: Map([chainstream_one-1-window-window_cid_batch,10] -> 0)}}
 {{2018-05-29 16:01:55,655 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Creating new SimpleConsumer for host ip-x.us-east-1.code418.net:9092 for system kafka}}
 {{2018-05-29 16:01:55,656 [main] INFO org.apache.samza.system.kafka.GetOffset - Validating offset 0 for topic and partition [chainstream_one-1-window-window_cid_batch,10]}}
 {{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.GetOffset - Able to successfully read from offset 0 for topic and partition [chainstream_one-1-window-window_cid_batch,10]. Using it to instantiate consumer.}}
 {{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Starting BrokerProxy for ip-x.us-east-1.code418.net:9092}}
 {{2018-05-29 16:01:58,129 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 1000000 entries restored...}}
 {{2018-05-29 16:01:59,707 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 2000000 entries restored...}}
 {{2018-05-29 16:02:01,318 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 3000000 entries restored...}}
 {{2018-05-29 16:02:02,920 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 4000000 entries restored...}}{{End of LogType:stdout. This log file belongs to a running container (}}

 Other containers starts as normal:

 

{{2018-05-29 16:02:18,564 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 13000000 entries restored...}}
 {{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Shutting down BrokerProxy for ip-x.net:9092}}
 {{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy - closing simple consumer...}}

 

System:

Samza 0.14

kafka.x86_64           0.11.0.1-1                    
|YARN|2.7.3|
|ZooKeeper|3.4.6|

  was:
There's a problem we could not reproduce in dev enviroment which affected prod enviroment.

 

Issue is that on restart of application 4 containers out of 12 hanging on restore from kv store changelog.

 

this is how it manifests:

 

{{2018-05-29 16:01:55,650 [main] INFO org.apache.samza.storage.TaskStorageManager - Assigning oldest change log offsets for taskName Partition 8: Map(SystemStream [system=kafka, stream=chainstream_one-1-window-window_cid_batch] -> 0)}}
{{2018-05-29 16:01:55,653 [main] INFO org.apache.samza.storage.TaskStorageManager - Registering change log consumer with offset 0 for SystemStreamPartition [kafka, chainstream_one-1-window-window_cid_batch, 10].}}
{{2018-05-29 16:01:55,654 [main] INFO org.apache.samza.system.kafka.KafkaSystemConsumer - Refreshing brokers for: Map([chainstream_one-1-window-window_cid_batch,10] -> 0)}}
{{2018-05-29 16:01:55,655 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Creating new SimpleConsumer for host ip-x.us-east-1.code418.net:9092 for system kafka}}
{{2018-05-29 16:01:55,656 [main] INFO org.apache.samza.system.kafka.GetOffset - Validating offset 0 for topic and partition [chainstream_one-1-window-window_cid_batch,10]}}
{{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.GetOffset - Able to successfully read from offset 0 for topic and partition [chainstream_one-1-window-window_cid_batch,10]. Using it to instantiate consumer.}}
{{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Starting BrokerProxy for ip-x.us-east-1.code418.net:9092}}
{{2018-05-29 16:01:58,129 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 1000000 entries restored...}}
{{2018-05-29 16:01:59,707 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 2000000 entries restored...}}
{{2018-05-29 16:02:01,318 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 3000000 entries restored...}}
{{2018-05-29 16:02:02,920 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 4000000 entries restored...}}{{End of LogType:stdout. This log file belongs to a running container (}}

 Other containers starts as normal:

 

{{2018-05-29 16:02:18,564 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 13000000 entries restored...}}
{{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Shutting down BrokerProxy for ip-x.net:9092}}
{{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy - closing simple consumer...}}

 

System:

Samza 0.14

kafka.x86_64           0.11.0.1-1                    
|YARN|2.7.3|

|ZooKeeper|3.4.6|


> Some containers hangs on restore KV store
> -----------------------------------------
>
>                 Key: SAMZA-1739
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1739
>             Project: Samza
>          Issue Type: Bug
>          Components: kafka, kv-store
>    Affects Versions: 0.14.0
>            Reporter: Akim Akimov
>            Priority: Major
>
> There's a problem we could not reproduce in dev enviroment which affected prod enviroment.
>  
> Issue is that on restart of application 4 containers out of 12 hanging on restore from kv store changelog.
>  
>  
> Application configuration:
> 12 containers deployed with yarn. kv store in question - window aggregation KV 
> this is how it manifests:
>  
> {{2018-05-29 16:01:55,650 [main] INFO org.apache.samza.storage.TaskStorageManager - Assigning oldest change log offsets for taskName Partition 8: Map(SystemStream [system=kafka, stream=chainstream_one-1-window-window_cid_batch] -> 0)}}
>  {{2018-05-29 16:01:55,653 [main] INFO org.apache.samza.storage.TaskStorageManager - Registering change log consumer with offset 0 for SystemStreamPartition [kafka, chainstream_one-1-window-window_cid_batch, 10].}}
>  {{2018-05-29 16:01:55,654 [main] INFO org.apache.samza.system.kafka.KafkaSystemConsumer - Refreshing brokers for: Map([chainstream_one-1-window-window_cid_batch,10] -> 0)}}
>  {{2018-05-29 16:01:55,655 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Creating new SimpleConsumer for host ip-x.us-east-1.code418.net:9092 for system kafka}}
>  {{2018-05-29 16:01:55,656 [main] INFO org.apache.samza.system.kafka.GetOffset - Validating offset 0 for topic and partition [chainstream_one-1-window-window_cid_batch,10]}}
>  {{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.GetOffset - Able to successfully read from offset 0 for topic and partition [chainstream_one-1-window-window_cid_batch,10]. Using it to instantiate consumer.}}
>  {{2018-05-29 16:01:55,693 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Starting BrokerProxy for ip-x.us-east-1.code418.net:9092}}
>  {{2018-05-29 16:01:58,129 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 1000000 entries restored...}}
>  {{2018-05-29 16:01:59,707 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 2000000 entries restored...}}
>  {{2018-05-29 16:02:01,318 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 3000000 entries restored...}}
>  {{2018-05-29 16:02:02,920 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 4000000 entries restored...}}{{End of LogType:stdout. This log file belongs to a running container (}}
>  Other containers starts as normal:
>  
> {{2018-05-29 16:02:18,564 [main] INFO org.apache.samza.storage.kv.KeyValueStorageEngine - 13000000 entries restored...}}
>  {{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy - Shutting down BrokerProxy for ip-x.net:9092}}
>  {{2018-05-29 16:02:19,700 [main] INFO org.apache.samza.system.kafka.BrokerProxy - closing simple consumer...}}
>  
> System:
> Samza 0.14
> kafka.x86_64           0.11.0.1-1                    
> |YARN|2.7.3|
> |ZooKeeper|3.4.6|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)