You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Michael Strobl (JIRA)" <ji...@apache.org> on 2015/05/20 16:47:00 UTC

[jira] [Created] (SAMZA-684) Wrong partition count in auto-created changelog streams for KV Store

Michael Strobl created SAMZA-684:
------------------------------------

             Summary: Wrong partition count in auto-created changelog streams for KV Store
                 Key: SAMZA-684
                 URL: https://issues.apache.org/jira/browse/SAMZA-684
             Project: Samza
          Issue Type: Bug
          Components: container
    Affects Versions: 0.9.0
            Reporter: Michael Strobl


[SAMZA-226] impletented an auto-create feature for the changelog streams of used KV stores. Unfortunally there is a bug, when the job is deployed on a yarn cluster with more then one task containers: 

The auto-created changelog Kafka topics for the KV stores have wrong partition counts. I deployed the job serveral times and only on low container counts the right amount of changelog partitions where created correctly. The job deployment fails also in case of wrong partition counts - see error log below.

Below is the Kafka topic list. Task input is the topic "portal-events" with 16 partitions. Auto-created changelog topics "buffer-store-changelog" and "user-store-changelog" have wrong partition counts (14 and 12 - should be also 16):

Topic:__samza_checkpoint_ver_1_for_portal-parser_1	PartitionCount:1	ReplicationFactor:3	Configs:segment.bytes=26214400,cleanup.policy=compact
	Topic: __samza_checkpoint_ver_1_for_portal-parser_1	Partition: 0	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
Topic:portal-events	PartitionCount:16	ReplicationFactor:3	Configs:
	Topic: portal-events	Partition: 0	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
	Topic: portal-events	Partition: 1	Leader: 1	Replicas: 1,2,3	Isr: 1,2,3
	Topic: portal-events	Partition: 2	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
	Topic: portal-events	Partition: 3	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
	Topic: portal-events	Partition: 4	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
	Topic: portal-events	Partition: 5	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
	Topic: portal-events	Partition: 6	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
	Topic: portal-events	Partition: 7	Leader: 1	Replicas: 1,2,3	Isr: 1,2,3
	Topic: portal-events	Partition: 8	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
	Topic: portal-events	Partition: 9	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
	Topic: portal-events	Partition: 10	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
	Topic: portal-events	Partition: 11	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
	Topic: portal-events	Partition: 12	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
	Topic: portal-events	Partition: 13	Leader: 1	Replicas: 1,2,3	Isr: 1,2,3
	Topic: portal-events	Partition: 14	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
	Topic: portal-events	Partition: 15	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
Topic:buffer-store-changelog	PartitionCount:14	ReplicationFactor:3	Configs:segment.bytes=536870912,cleanup.policy=compact
	Topic: buffer-store-changelog	Partition: 0	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
	Topic: buffer-store-changelog	Partition: 1	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
	Topic: buffer-store-changelog	Partition: 2	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
	Topic: buffer-store-changelog	Partition: 3	Leader: 1	Replicas: 1,2,3	Isr: 1,2,3
	Topic: buffer-store-changelog	Partition: 4	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
	Topic: buffer-store-changelog	Partition: 5	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
	Topic: buffer-store-changelog	Partition: 6	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
	Topic: buffer-store-changelog	Partition: 7	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
	Topic: buffer-store-changelog	Partition: 8	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
	Topic: buffer-store-changelog	Partition: 9	Leader: 1	Replicas: 1,2,3	Isr: 1,2,3
	Topic: buffer-store-changelog	Partition: 10	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
	Topic: buffer-store-changelog	Partition: 11	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
	Topic: buffer-store-changelog	Partition: 12	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
	Topic: buffer-store-changelog	Partition: 13	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
Topic:user-store-changelog	PartitionCount:12	ReplicationFactor:3	Configs:segment.bytes=536870912,cleanup.policy=compact
	Topic: user-store-changelog	Partition: 0	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
	Topic: user-store-changelog	Partition: 1	Leader: 1	Replicas: 1,2,3	Isr: 1,2,3
	Topic: user-store-changelog	Partition: 2	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
	Topic: user-store-changelog	Partition: 3	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
	Topic: user-store-changelog	Partition: 4	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
	Topic: user-store-changelog	Partition: 5	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
	Topic: user-store-changelog	Partition: 6	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
	Topic: user-store-changelog	Partition: 7	Leader: 1	Replicas: 1,2,3	Isr: 1,2,3
	Topic: user-store-changelog	Partition: 8	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
	Topic: user-store-changelog	Partition: 9	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
	Topic: user-store-changelog	Partition: 10	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
	Topic: user-store-changelog	Partition: 11	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3




Lines of the Samza application master error log:

2015-05-20 16:08:48 AbstractHttpConnection [DEBUG] closed AsyncHttpConnection@3e134a3f,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=0,l=0,c=-3},r=1
2015-05-20 16:08:49 JvmMetrics [DEBUG] updating jvm metrics
2015-05-20 16:08:49 JvmMetrics [DEBUG] updated metrics to: [64.090485, 64.875, 247.20428, 723.5, 0, 21, 9, 5, 23, 0, 6, 176]
2015-05-20 16:08:49 Client [DEBUG] IPC Client (828072249) connection to yrmsl1/10.100.86.204:8030 from yarn sending #24
2015-05-20 16:08:49 Client [DEBUG] IPC Client (828072249) connection to yrmsl1/10.100.86.204:8030 from yarn got value #24
2015-05-20 16:08:49 ProtobufRpcEngine [DEBUG] Call: allocate took 2ms
2015-05-20 16:08:49 SamzaAppMasterTaskManager [INFO] Container container_1431707519455_0024_01_000002 failed with exit code 1 - Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: 
org.apache.hadoop.util.Shell$ExitCodeException: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
        at org.apache.hadoop.util.Shell.run(Shell.java:418)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
.
2015-05-20 16:08:49 SamzaAppMasterTaskManager [INFO] Failed container container_1431707519455_0024_01_000002 owned task id 0.
2015-05-20 16:08:49 SamzaAppMasterTaskManager [INFO] Current fail count for task id 0 is 1.
2015-05-20 16:08:49 SamzaAppMasterTaskManager [INFO] Requesting 1 container(s) with 1024mb of memory


Lines of the task container error log:

java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
org.apache.samza.system.kafka.KafkaSystemAdmin$KafkaChangelogException: Changelog topic validation failed for topic buffer-store-changelog because partition count 14 did not match expected partition count of 16
        at org.apache.samza.system.kafka.KafkaSystemAdmin$$anonfun$validateTopicInKafka$2.apply(KafkaSystemAdmin.scala:327)
        at org.apache.samza.system.kafka.KafkaSystemAdmin$$anonfun$validateTopicInKafka$2.apply(KafkaSystemAdmin.scala:319)
        at org.apache.samza.util.ExponentialSleepStrategy.run(ExponentialSleepStrategy.scala:82)
        at org.apache.samza.system.kafka.KafkaSystemAdmin.validateTopicInKafka(KafkaSystemAdmin.scala:318)
        at org.apache.samza.system.kafka.KafkaSystemAdmin.createChangelogStream(KafkaSystemAdmin.scala:354)
        at org.apache.samza.storage.TaskStorageManager$$anonfun$createStreams$3.apply(TaskStorageManager.scala:86)
        at org.apache.samza.storage.TaskStorageManager$$anonfun$createStreams$3.apply(TaskStorageManager.scala:84)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
        at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
        at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
        at scala.collection.immutable.Map$Map2.foreach(Map.scala:130)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
        at scala.collection.MapLike$MappedValues.foreach(MapLike.scala:245)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
        at org.apache.samza.storage.TaskStorageManager.createStreams(TaskStorageManager.scala:84)
        at org.apache.samza.storage.TaskStorageManager.init(TaskStorageManager.scala:63)
        at org.apache.samza.container.TaskInstance.startStores(TaskInstance.scala:90)
        at org.apache.samza.container.SamzaContainer$$anonfun$startStores$2.apply(SamzaContainer.scala:600)
        at org.apache.samza.container.SamzaContainer$$anonfun$startStores$2.apply(SamzaContainer.scala:600)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206)
        at org.apache.samza.container.SamzaContainer.startStores(SamzaContainer.scala:600)
        at org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:543)
        at org.apache.samza.container.SamzaContainer$.safeMain(SamzaContainer.scala:93)
        at org.apache.samza.container.SamzaContainer$.main(SamzaContainer.scala:67)
        at org.apache.samza.container.SamzaContainer.main(SamzaContainer.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)