You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Shanthoosh Venkataraman (JIRA)" <ji...@apache.org> on 2018/03/07 08:32:00 UTC

[jira] [Updated] (SAMZA-1607) Fix bug in reading the ephemeral processor nodes from zookeeper.

     [ https://issues.apache.org/jira/browse/SAMZA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shanthoosh Venkataraman updated SAMZA-1607:
-------------------------------------------
    Description: 
Existing implementation of reading the data of ephemeral processor nodes in zookeeper happens in two steps.

   A. Fetch the list of ephemeral processor nodes.

   B. Read the data of each processor node from the list. 

Some zookeeper nodes present in step A might be unavailable in the step B. This exception in unhandled currently and can kill the leader processor unnecessarily. Here's the related exception observed in a dev setup.
{code:java}
org.apache.samza.SamzaException: Cannot read ZK node: /app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001

at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:232)
at org.apache.samza.zk.ZkUtils.getActiveProcessorsIDs(ZkUtils.java:255)
at org.apache.samza.zk.ZkJobCoordinator.getActualProcessorIds(ZkJobCoordinator.java:292)
at org.apache.samza.zk.ZkJobCoordinator.doOnProcessorChange(ZkJobCoordinator.java:194)
at org.apache.samza.zk.ZkJobCoordinator.lambda$onProcessorChange$1(ZkJobCoordinator.java:188)
at org.apache.samza.zk.ScheduleAfterDebounceTime.lambda$getScheduleableAction$0(ScheduleAfterDebounceTime.java:134)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1001)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1100)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1095)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1084)
at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:226)
{code}

  was:
Existing implementation of reading the data of ephemeral processor nodes in zookeeper happens in two steps.

   A. Fetch the list of ephemeral processor nodes.

   B. Read the data of each processor node from the list. 

Some zookeeper nodes present in step A might be unavailable in the step B. This exception in unhandled currently and can kill the leader processor unnecessarily. Here's the sample exception observed in a dev setup.
{code:java}
org.apache.samza.SamzaException: Cannot read ZK node: /app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001

at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:232)
at org.apache.samza.zk.ZkUtils.getActiveProcessorsIDs(ZkUtils.java:255)
at org.apache.samza.zk.ZkJobCoordinator.getActualProcessorIds(ZkJobCoordinator.java:292)
at org.apache.samza.zk.ZkJobCoordinator.doOnProcessorChange(ZkJobCoordinator.java:194)
at org.apache.samza.zk.ZkJobCoordinator.lambda$onProcessorChange$1(ZkJobCoordinator.java:188)
at org.apache.samza.zk.ScheduleAfterDebounceTime.lambda$getScheduleableAction$0(ScheduleAfterDebounceTime.java:134)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1001)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1100)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1095)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1084)
at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:226)

{code}
 

 

 


> Fix bug in reading the ephemeral processor nodes from zookeeper.
> ----------------------------------------------------------------
>
>                 Key: SAMZA-1607
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1607
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Shanthoosh Venkataraman
>            Assignee: Shanthoosh Venkataraman
>            Priority: Major
>
> Existing implementation of reading the data of ephemeral processor nodes in zookeeper happens in two steps.
>    A. Fetch the list of ephemeral processor nodes.
>    B. Read the data of each processor node from the list. 
> Some zookeeper nodes present in step A might be unavailable in the step B. This exception in unhandled currently and can kill the leader processor unnecessarily. Here's the related exception observed in a dev setup.
> {code:java}
> org.apache.samza.SamzaException: Cannot read ZK node: /app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001
> at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:232)
> at org.apache.samza.zk.ZkUtils.getActiveProcessorsIDs(ZkUtils.java:255)
> at org.apache.samza.zk.ZkJobCoordinator.getActualProcessorIds(ZkJobCoordinator.java:292)
> at org.apache.samza.zk.ZkJobCoordinator.doOnProcessorChange(ZkJobCoordinator.java:194)
> at org.apache.samza.zk.ZkJobCoordinator.lambda$onProcessorChange$1(ZkJobCoordinator.java:188)
> at org.apache.samza.zk.ScheduleAfterDebounceTime.lambda$getScheduleableAction$0(ScheduleAfterDebounceTime.java:134)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001
> at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
> at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1001)
> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1100)
> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1095)
> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1084)
> at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:226)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)