You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/03 05:17:16 UTC

[GitHub] [druid] maytasm opened a new pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

maytasm opened a new pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610
 
 
   Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
   
   ### Description
   
   An overlord can encountered the following NPE and subsequently triggered a JVM shutdown:
   
   ```
   java.lang.NullPointerException: null
   at org.apache.druid.indexing.overlord.RemoteTaskRunner.lambda$addWorker$4(RemoteTaskRunner.java:1070) 
   ```
   ```
   This is in the following code reproduced below:
   catch (Exception e) {
   log.makeAlert(e, "Failed to handle new worker status")
   .addData("worker", zkWorker.getWorker().getHost())
   .addData("znode", event.getData().getPath())
   .emit();
   }
   ```
   
   This method needs a null check for `event.getData()`, since data can be null: https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/cache/PathChildrenCacheEvent.html
   Also, added null checks for other parameters just to be safe.
   
   Also updated the version of Curator to 4.3.0. Note that Curator version 4.3.0 has the fix for https://github.com/apache/druid/pull/8177 which earlier (before 4.3.0 exist) prevent us upgrading Curator's version.
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [x] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on issue #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on issue #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#issuecomment-608997375
 
 
   This pull request **introduces 1 alert** when merging 0e62d3da2abb5d05a7bf89f304c8031a2c1a5163 into 4d277dbf9901a592789fbcc6b1dff1ebebb5e00e - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-eb6cd2b1db4c050994096e8fb4b19c6a2537bb87)
   
   **new alerts:**
   
   * 1 for Dereferenced variable may be null

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r404389664
 
 

 ##########
 File path: extensions-contrib/ambari-metrics-emitter/pom.xml
 ##########
 @@ -48,6 +48,18 @@
       <type>test-jar</type>
       <scope>test</scope>
     </dependency>
+    <dependency>
+      <groupId>org.codehaus.jackson</groupId>
+      <artifactId>jackson-core-asl</artifactId>
+      <version>${codehaus.jackson.version}</version>
+      <scope>test</scope>
 
 Review comment:
   Please put these test dependencies with others below (at Line 133) together.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ccaominh commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
ccaominh commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r404226982
 
 

 ##########
 File path: extensions-contrib/ambari-metrics-emitter/pom.xml
 ##########
 @@ -48,6 +48,18 @@
       <type>test-jar</type>
       <scope>test</scope>
     </dependency>
+    <dependency>
 
 Review comment:
   If jackson-mapper-asl:1.9.13 is no longer in production code, can you check if its security vulnerability suppression can be removed? https://github.com/apache/druid/blob/master/owasp-dependency-check-suppressions.xml#L125
   
   You can run the security vulnerability scan by running: `mvn dependency-check:check`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r403438594
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
 ##########
 @@ -1081,6 +979,117 @@ private boolean cancelWorkerCleanup(String workerHost)
     }
   }
 
+  @VisibleForTesting
+  PathChildrenCacheListener getStatusListener(final Worker worker, final ZkWorker zkWorker, final SettableFuture<ZkWorker> retVal)
+  {
+    return (client, event) -> {
+      final String taskId;
+      final RemoteTaskRunnerWorkItem taskRunnerWorkItem;
+      synchronized (statusLock) {
+        try {
+          switch (event.getType()) { // lgtm [java/dereferenced-value-may-be-null]
+            case CHILD_ADDED:
+            case CHILD_UPDATED:
+              taskId = ZKPaths.getNodeFromPath(event.getData().getPath()); // lgtm [java/dereferenced-value-may-be-null]
 
 Review comment:
   The `childData` itself is null. The only other data that we have is the eventType which we can always add to the alert? or only add to alert if `childData` is null

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on issue #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on issue #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#issuecomment-608705195
 
 
   This pull request **introduces 2 alerts** when merging 51106421b12a43b7f27bc2db0c033997378c85cc into b5419962f0995daf0d4361a137f5f6c123fd9aa0 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-2b70c51039cd2e7e1381182bec789e36b8b8c975)
   
   **new alerts:**
   
   * 2 for Dereferenced variable may be null

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r404366928
 
 

 ##########
 File path: extensions-contrib/ambari-metrics-emitter/pom.xml
 ##########
 @@ -48,6 +48,18 @@
       <type>test-jar</type>
       <scope>test</scope>
     </dependency>
+    <dependency>
 
 Review comment:
   I tried to run `mvn dependency-check:check` but it failed on druid-console. It also fails on druid-console even on master. Am I doing something wrong? I dont think I see these failures on the cron job
   
   [ERROR] One or more dependencies were identified with vulnerabilities that have a CVSS score greater than or equal to '7.0': 
   [ERROR] 
   [ERROR] package.json: CVE-2020-7608
   [ERROR] package.json: CVE-2020-7608
   [ERROR] package.json: CVE-2020-7608
   [ERROR] package.json: CVE-2020-7608
   [ERROR] package.json: CVE-2020-7608
   [ERROR] package.json: CVE-2020-7608

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r403436135
 
 

 ##########
 File path: extensions-contrib/ambari-metrics-emitter/pom.xml
 ##########
 @@ -48,6 +48,18 @@
       <type>test-jar</type>
       <scope>test</scope>
     </dependency>
+    <dependency>
 
 Review comment:
   Hence, explicitly adding dependecy on those org.codehaus.jackson:jackson-mapper-asl:jar and org.codehaus.jackson:jackson-core-asl:jar

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r404366928
 
 

 ##########
 File path: extensions-contrib/ambari-metrics-emitter/pom.xml
 ##########
 @@ -48,6 +48,18 @@
       <type>test-jar</type>
       <scope>test</scope>
     </dependency>
+    <dependency>
 
 Review comment:
   I tried to run `mvn dependency-check:check` but it failed on druid-console. It also fails on druid-console even on master. Am I doing something wrong? I dont think I see these failures on the cron job
   
   One or more dependencies were identified with known vulnerabilities in druid-console:
   
   package.json (pkg:npm/acorn@6.3.0) : 1488
   package.json (pkg:npm/kind-of@3.2.2) : 1490
   package.json (pkg:npm/yargs-parser@13.1.1) : CVE-2020-7608
   package.json (pkg:npm/minimist@0.0.10) : 1179
   package.json (pkg:npm/yargs@7.1.0) : CVE-2020-7608
   package.json (pkg:npm/yargs-parser@5.0.0) : CVE-2020-7608
   package.json (pkg:npm/yargs-parser@10.1.0) : CVE-2020-7608
   package.json (pkg:npm/yargs@12.0.5) : CVE-2020-7608
   package.json (pkg:npm/yargs-parser@11.1.1) : CVE-2020-7608

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r404391313
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
 ##########
 @@ -969,116 +970,129 @@ private boolean cancelWorkerCleanup(String workerHost)
       );
 
       // Add status listener to the watcher for status changes
-      zkWorker.addListener(
-          (client, event) -> {
-            final String taskId;
-            final RemoteTaskRunnerWorkItem taskRunnerWorkItem;
-            synchronized (statusLock) {
-              try {
-                switch (event.getType()) {
-                  case CHILD_ADDED:
-                  case CHILD_UPDATED:
-                    taskId = ZKPaths.getNodeFromPath(event.getData().getPath());
-                    final TaskAnnouncement announcement = jsonMapper.readValue(
-                        event.getData().getData(), TaskAnnouncement.class
-                    );
-
-                    log.info(
-                        "Worker[%s] wrote %s status for task [%s] on [%s]",
-                        zkWorker.getWorker().getHost(),
-                        announcement.getTaskStatus().getStatusCode(),
-                        taskId,
-                        announcement.getTaskLocation()
-                    );
-
-                    // Synchronizing state with ZK
-                    statusLock.notifyAll();
-
-                    final RemoteTaskRunnerWorkItem tmp;
-                    if ((tmp = runningTasks.get(taskId)) != null) {
-                      taskRunnerWorkItem = tmp;
-                    } else {
-                      final RemoteTaskRunnerWorkItem newTaskRunnerWorkItem = new RemoteTaskRunnerWorkItem(
-                          taskId,
-                          announcement.getTaskType(),
-                          zkWorker.getWorker(),
-                          TaskLocation.unknown(),
-                          announcement.getTaskDataSource()
-                      );
-                      final RemoteTaskRunnerWorkItem existingItem = runningTasks.putIfAbsent(
-                          taskId,
-                          newTaskRunnerWorkItem
-                      );
-                      if (existingItem == null) {
-                        log.warn(
-                            "Worker[%s] announced a status for a task I didn't know about, adding to runningTasks: %s",
-                            zkWorker.getWorker().getHost(),
-                            taskId
-                        );
-                        taskRunnerWorkItem = newTaskRunnerWorkItem;
-                      } else {
-                        taskRunnerWorkItem = existingItem;
-                      }
-                    }
-
-                    if (!announcement.getTaskLocation().equals(taskRunnerWorkItem.getLocation())) {
-                      taskRunnerWorkItem.setLocation(announcement.getTaskLocation());
-                      TaskRunnerUtils.notifyLocationChanged(listeners, taskId, announcement.getTaskLocation());
-                    }
+      zkWorker.addListener(getStatusListener(worker, zkWorker, retVal));
+      zkWorker.start();
+      return retVal;
+    }
+    catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
 
-                    if (announcement.getTaskStatus().isComplete()) {
-                      taskComplete(taskRunnerWorkItem, zkWorker, announcement.getTaskStatus());
-                      runPendingTasks();
-                    }
-                    break;
-                  case CHILD_REMOVED:
-                    taskId = ZKPaths.getNodeFromPath(event.getData().getPath());
-                    taskRunnerWorkItem = runningTasks.remove(taskId);
-                    if (taskRunnerWorkItem != null) {
-                      log.info("Task[%s] just disappeared!", taskId);
-                      taskRunnerWorkItem.setResult(TaskStatus.failure(taskId));
-                      TaskRunnerUtils.notifyStatusChanged(listeners, taskId, TaskStatus.failure(taskId));
-                    } else {
-                      log.info("Task[%s] went bye bye.", taskId);
-                    }
-                    break;
-                  case INITIALIZED:
-                    if (zkWorkers.putIfAbsent(worker.getHost(), zkWorker) == null) {
-                      retVal.set(zkWorker);
-                    } else {
-                      final String message = StringUtils.format(
-                          "WTF?! Tried to add already-existing worker[%s]",
-                          worker.getHost()
-                      );
-                      log.makeAlert(message)
-                         .addData("workerHost", worker.getHost())
-                         .addData("workerIp", worker.getIp())
-                         .emit();
-                      retVal.setException(new IllegalStateException(message));
-                    }
-                    runPendingTasks();
-                    break;
-                  case CONNECTION_SUSPENDED:
-                  case CONNECTION_RECONNECTED:
-                  case CONNECTION_LOST:
-                    // do nothing
+  @VisibleForTesting
+  PathChildrenCacheListener getStatusListener(final Worker worker, final ZkWorker zkWorker, final SettableFuture<ZkWorker> retVal)
+  {
+    return (client, event) -> {
+      final String taskId;
+      final RemoteTaskRunnerWorkItem taskRunnerWorkItem;
+      synchronized (statusLock) {
+        try {
+          switch (event.getType()) { // lgtm [java/dereferenced-value-may-be-null]
+            case CHILD_ADDED:
+            case CHILD_UPDATED:
+              taskId = ZKPaths.getNodeFromPath(event.getData().getPath()); // lgtm [java/dereferenced-value-may-be-null]
+              final TaskAnnouncement announcement = jsonMapper.readValue(
+                  event.getData().getData(), TaskAnnouncement.class // lgtm [java/dereferenced-value-may-be-null]
+              );
+
+              log.info(
+                  "Worker[%s] wrote %s status for task [%s] on [%s]",
+                  zkWorker.getWorker().getHost(),
+                  announcement.getTaskStatus().getStatusCode(),
+                  taskId,
+                  announcement.getTaskLocation()
+              );
+
+              // Synchronizing state with ZK
+              statusLock.notifyAll();
+
+              final RemoteTaskRunnerWorkItem tmp;
+              if ((tmp = runningTasks.get(taskId)) != null) {
+                taskRunnerWorkItem = tmp;
+              } else {
+                final RemoteTaskRunnerWorkItem newTaskRunnerWorkItem = new RemoteTaskRunnerWorkItem(
+                    taskId,
+                    announcement.getTaskType(),
+                    zkWorker.getWorker(),
+                    TaskLocation.unknown(),
+                    announcement.getTaskDataSource()
+                );
+                final RemoteTaskRunnerWorkItem existingItem = runningTasks.putIfAbsent(
+                    taskId,
+                    newTaskRunnerWorkItem
+                );
+                if (existingItem == null) {
+                  log.warn(
+                      "Worker[%s] announced a status for a task I didn't know about, adding to runningTasks: %s",
+                      zkWorker.getWorker().getHost(),
+                      taskId
+                  );
+                  taskRunnerWorkItem = newTaskRunnerWorkItem;
+                } else {
+                  taskRunnerWorkItem = existingItem;
                 }
               }
-              catch (Exception e) {
-                log.makeAlert(e, "Failed to handle new worker status")
-                   .addData("worker", zkWorker.getWorker().getHost())
-                   .addData("znode", event.getData().getPath())
+
+              if (!announcement.getTaskLocation().equals(taskRunnerWorkItem.getLocation())) {
+                taskRunnerWorkItem.setLocation(announcement.getTaskLocation());
+                TaskRunnerUtils.notifyLocationChanged(listeners, taskId, announcement.getTaskLocation());
+              }
+
+              if (announcement.getTaskStatus().isComplete()) {
+                taskComplete(taskRunnerWorkItem, zkWorker, announcement.getTaskStatus());
+                runPendingTasks();
+              }
+              break;
+            case CHILD_REMOVED:
+              taskId = ZKPaths.getNodeFromPath(event.getData().getPath()); // lgtm [java/dereferenced-value-may-be-null]
+              taskRunnerWorkItem = runningTasks.remove(taskId);
+              if (taskRunnerWorkItem != null) {
+                log.info("Task[%s] just disappeared!", taskId);
+                taskRunnerWorkItem.setResult(TaskStatus.failure(taskId));
+                TaskRunnerUtils.notifyStatusChanged(listeners, taskId, TaskStatus.failure(taskId));
+              } else {
+                log.info("Task[%s] went bye bye.", taskId);
+              }
+              break;
+            case INITIALIZED:
+              if (zkWorkers.putIfAbsent(worker.getHost(), zkWorker) == null) {
+                retVal.set(zkWorker);
+              } else {
+                final String message = StringUtils.format(
+                    "This should not happen...tried to add already-existing worker[%s]",
+                    worker.getHost()
+                );
+                log.makeAlert(message)
+                   .addData("workerHost", worker.getHost())
+                   .addData("workerIp", worker.getIp())
                    .emit();
+                retVal.setException(new IllegalStateException(message));
               }
+              runPendingTasks();
+              break;
+            case CONNECTION_SUSPENDED:
+            case CONNECTION_RECONNECTED:
+            case CONNECTION_LOST:
+              // do nothing
+          }
+        }
+        catch (Exception e) {
+          String znode = null;
+          String eventType = null;
+          if (event != null) {
 
 Review comment:
   I don't think `event` can ever be null. Check out callers of `PathChildrenCache.callListeners()`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r403436420
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
 ##########
 @@ -1081,6 +979,117 @@ private boolean cancelWorkerCleanup(String workerHost)
     }
   }
 
+  @VisibleForTesting
+  PathChildrenCacheListener getStatusListener(final Worker worker, final ZkWorker zkWorker, final SettableFuture<ZkWorker> retVal)
+  {
+    return (client, event) -> {
+      final String taskId;
+      final RemoteTaskRunnerWorkItem taskRunnerWorkItem;
+      synchronized (statusLock) {
+        try {
+          switch (event.getType()) { // lgtm [java/dereferenced-value-may-be-null]
+            case CHILD_ADDED:
+            case CHILD_UPDATED:
+              taskId = ZKPaths.getNodeFromPath(event.getData().getPath()); // lgtm [java/dereferenced-value-may-be-null]
 
 Review comment:
   Yes. event.getData() can be null here and everywhere else in this function. I dont think we need to do anything differently in handling the null from event.getData() so I just let NPE happens and the catch will take care of logging and making sure the JVM doesnt shutdown. You can deduce that the NPE from event.getData() happens from the data map of the alert and the Exception that was produced

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r403437970
 
 

 ##########
 File path: extensions-contrib/ambari-metrics-emitter/pom.xml
 ##########
 @@ -48,6 +48,18 @@
       <type>test-jar</type>
       <scope>test</scope>
     </dependency>
+    <dependency>
 
 Review comment:
   This was the dependency tree before the change in this PR
   [INFO] |  +- org.apache.curator:curator-x-discovery:jar:4.1.0:test
   [INFO] |  |  \- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:test
   [INFO] |  |     \- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:test

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r403436150
 
 

 ##########
 File path: extensions-contrib/ambari-metrics-emitter/pom.xml
 ##########
 @@ -48,6 +48,18 @@
       <type>test-jar</type>
       <scope>test</scope>
     </dependency>
+    <dependency>
 
 Review comment:
   Actually I think they can be scope test

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r404595527
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
 ##########
 @@ -969,116 +970,129 @@ private boolean cancelWorkerCleanup(String workerHost)
       );
 
       // Add status listener to the watcher for status changes
-      zkWorker.addListener(
-          (client, event) -> {
-            final String taskId;
-            final RemoteTaskRunnerWorkItem taskRunnerWorkItem;
-            synchronized (statusLock) {
-              try {
-                switch (event.getType()) {
-                  case CHILD_ADDED:
-                  case CHILD_UPDATED:
-                    taskId = ZKPaths.getNodeFromPath(event.getData().getPath());
-                    final TaskAnnouncement announcement = jsonMapper.readValue(
-                        event.getData().getData(), TaskAnnouncement.class
-                    );
-
-                    log.info(
-                        "Worker[%s] wrote %s status for task [%s] on [%s]",
-                        zkWorker.getWorker().getHost(),
-                        announcement.getTaskStatus().getStatusCode(),
-                        taskId,
-                        announcement.getTaskLocation()
-                    );
-
-                    // Synchronizing state with ZK
-                    statusLock.notifyAll();
-
-                    final RemoteTaskRunnerWorkItem tmp;
-                    if ((tmp = runningTasks.get(taskId)) != null) {
-                      taskRunnerWorkItem = tmp;
-                    } else {
-                      final RemoteTaskRunnerWorkItem newTaskRunnerWorkItem = new RemoteTaskRunnerWorkItem(
-                          taskId,
-                          announcement.getTaskType(),
-                          zkWorker.getWorker(),
-                          TaskLocation.unknown(),
-                          announcement.getTaskDataSource()
-                      );
-                      final RemoteTaskRunnerWorkItem existingItem = runningTasks.putIfAbsent(
-                          taskId,
-                          newTaskRunnerWorkItem
-                      );
-                      if (existingItem == null) {
-                        log.warn(
-                            "Worker[%s] announced a status for a task I didn't know about, adding to runningTasks: %s",
-                            zkWorker.getWorker().getHost(),
-                            taskId
-                        );
-                        taskRunnerWorkItem = newTaskRunnerWorkItem;
-                      } else {
-                        taskRunnerWorkItem = existingItem;
-                      }
-                    }
-
-                    if (!announcement.getTaskLocation().equals(taskRunnerWorkItem.getLocation())) {
-                      taskRunnerWorkItem.setLocation(announcement.getTaskLocation());
-                      TaskRunnerUtils.notifyLocationChanged(listeners, taskId, announcement.getTaskLocation());
-                    }
+      zkWorker.addListener(getStatusListener(worker, zkWorker, retVal));
+      zkWorker.start();
+      return retVal;
+    }
+    catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
 
-                    if (announcement.getTaskStatus().isComplete()) {
-                      taskComplete(taskRunnerWorkItem, zkWorker, announcement.getTaskStatus());
-                      runPendingTasks();
-                    }
-                    break;
-                  case CHILD_REMOVED:
-                    taskId = ZKPaths.getNodeFromPath(event.getData().getPath());
-                    taskRunnerWorkItem = runningTasks.remove(taskId);
-                    if (taskRunnerWorkItem != null) {
-                      log.info("Task[%s] just disappeared!", taskId);
-                      taskRunnerWorkItem.setResult(TaskStatus.failure(taskId));
-                      TaskRunnerUtils.notifyStatusChanged(listeners, taskId, TaskStatus.failure(taskId));
-                    } else {
-                      log.info("Task[%s] went bye bye.", taskId);
-                    }
-                    break;
-                  case INITIALIZED:
-                    if (zkWorkers.putIfAbsent(worker.getHost(), zkWorker) == null) {
-                      retVal.set(zkWorker);
-                    } else {
-                      final String message = StringUtils.format(
-                          "WTF?! Tried to add already-existing worker[%s]",
-                          worker.getHost()
-                      );
-                      log.makeAlert(message)
-                         .addData("workerHost", worker.getHost())
-                         .addData("workerIp", worker.getIp())
-                         .emit();
-                      retVal.setException(new IllegalStateException(message));
-                    }
-                    runPendingTasks();
-                    break;
-                  case CONNECTION_SUSPENDED:
-                  case CONNECTION_RECONNECTED:
-                  case CONNECTION_LOST:
-                    // do nothing
+  @VisibleForTesting
+  PathChildrenCacheListener getStatusListener(final Worker worker, final ZkWorker zkWorker, final SettableFuture<ZkWorker> retVal)
+  {
+    return (client, event) -> {
+      final String taskId;
+      final RemoteTaskRunnerWorkItem taskRunnerWorkItem;
+      synchronized (statusLock) {
+        try {
+          switch (event.getType()) { // lgtm [java/dereferenced-value-may-be-null]
+            case CHILD_ADDED:
+            case CHILD_UPDATED:
+              taskId = ZKPaths.getNodeFromPath(event.getData().getPath()); // lgtm [java/dereferenced-value-may-be-null]
+              final TaskAnnouncement announcement = jsonMapper.readValue(
+                  event.getData().getData(), TaskAnnouncement.class // lgtm [java/dereferenced-value-may-be-null]
+              );
+
+              log.info(
+                  "Worker[%s] wrote %s status for task [%s] on [%s]",
+                  zkWorker.getWorker().getHost(),
+                  announcement.getTaskStatus().getStatusCode(),
+                  taskId,
+                  announcement.getTaskLocation()
+              );
+
+              // Synchronizing state with ZK
+              statusLock.notifyAll();
+
+              final RemoteTaskRunnerWorkItem tmp;
+              if ((tmp = runningTasks.get(taskId)) != null) {
+                taskRunnerWorkItem = tmp;
+              } else {
+                final RemoteTaskRunnerWorkItem newTaskRunnerWorkItem = new RemoteTaskRunnerWorkItem(
+                    taskId,
+                    announcement.getTaskType(),
+                    zkWorker.getWorker(),
+                    TaskLocation.unknown(),
+                    announcement.getTaskDataSource()
+                );
+                final RemoteTaskRunnerWorkItem existingItem = runningTasks.putIfAbsent(
+                    taskId,
+                    newTaskRunnerWorkItem
+                );
+                if (existingItem == null) {
+                  log.warn(
+                      "Worker[%s] announced a status for a task I didn't know about, adding to runningTasks: %s",
+                      zkWorker.getWorker().getHost(),
+                      taskId
+                  );
+                  taskRunnerWorkItem = newTaskRunnerWorkItem;
+                } else {
+                  taskRunnerWorkItem = existingItem;
                 }
               }
-              catch (Exception e) {
-                log.makeAlert(e, "Failed to handle new worker status")
-                   .addData("worker", zkWorker.getWorker().getHost())
-                   .addData("znode", event.getData().getPath())
+
+              if (!announcement.getTaskLocation().equals(taskRunnerWorkItem.getLocation())) {
+                taskRunnerWorkItem.setLocation(announcement.getTaskLocation());
+                TaskRunnerUtils.notifyLocationChanged(listeners, taskId, announcement.getTaskLocation());
+              }
+
+              if (announcement.getTaskStatus().isComplete()) {
+                taskComplete(taskRunnerWorkItem, zkWorker, announcement.getTaskStatus());
+                runPendingTasks();
+              }
+              break;
+            case CHILD_REMOVED:
+              taskId = ZKPaths.getNodeFromPath(event.getData().getPath()); // lgtm [java/dereferenced-value-may-be-null]
+              taskRunnerWorkItem = runningTasks.remove(taskId);
+              if (taskRunnerWorkItem != null) {
+                log.info("Task[%s] just disappeared!", taskId);
+                taskRunnerWorkItem.setResult(TaskStatus.failure(taskId));
+                TaskRunnerUtils.notifyStatusChanged(listeners, taskId, TaskStatus.failure(taskId));
+              } else {
+                log.info("Task[%s] went bye bye.", taskId);
+              }
+              break;
+            case INITIALIZED:
+              if (zkWorkers.putIfAbsent(worker.getHost(), zkWorker) == null) {
+                retVal.set(zkWorker);
+              } else {
+                final String message = StringUtils.format(
+                    "This should not happen...tried to add already-existing worker[%s]",
+                    worker.getHost()
+                );
+                log.makeAlert(message)
+                   .addData("workerHost", worker.getHost())
+                   .addData("workerIp", worker.getIp())
                    .emit();
+                retVal.setException(new IllegalStateException(message));
               }
+              runPendingTasks();
+              break;
+            case CONNECTION_SUSPENDED:
+            case CONNECTION_RECONNECTED:
+            case CONNECTION_LOST:
+              // do nothing
+          }
+        }
+        catch (Exception e) {
+          String znode = null;
+          String eventType = null;
+          if (event != null) {
 
 Review comment:
   event cannot be null. I added the check just in case upstream and/or dependency library changes. This is to prevent the JVM from shutting down. I removed it for now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r403436420
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
 ##########
 @@ -1081,6 +979,117 @@ private boolean cancelWorkerCleanup(String workerHost)
     }
   }
 
+  @VisibleForTesting
+  PathChildrenCacheListener getStatusListener(final Worker worker, final ZkWorker zkWorker, final SettableFuture<ZkWorker> retVal)
+  {
+    return (client, event) -> {
+      final String taskId;
+      final RemoteTaskRunnerWorkItem taskRunnerWorkItem;
+      synchronized (statusLock) {
+        try {
+          switch (event.getType()) { // lgtm [java/dereferenced-value-may-be-null]
+            case CHILD_ADDED:
+            case CHILD_UPDATED:
+              taskId = ZKPaths.getNodeFromPath(event.getData().getPath()); // lgtm [java/dereferenced-value-may-be-null]
 
 Review comment:
   Yes. event.getData() can be null here and everywhere else in this function. I dont think we need to do anything differently in handling the null from event.getData() so I just let NPE happens and the catch will take care of logging and making sure the JVM doesnt shutdown. You can deduce that the NPE from event.getData() happens from the data map of the alert

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ccaominh commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
ccaominh commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r404386598
 
 

 ##########
 File path: extensions-contrib/ambari-metrics-emitter/pom.xml
 ##########
 @@ -48,6 +48,18 @@
       <type>test-jar</type>
       <scope>test</scope>
     </dependency>
+    <dependency>
 
 Review comment:
   If you skip druid console, do all the other modules pass? If so, then the suppression for jackson-mapper-asl can be removed. We'll need to fix the other issues in different PRs.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
clintropolis merged pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r403437945
 
 

 ##########
 File path: extensions-contrib/ambari-metrics-emitter/pom.xml
 ##########
 @@ -48,6 +48,18 @@
       <type>test-jar</type>
       <scope>test</scope>
     </dependency>
+    <dependency>
 
 Review comment:
   Changed to scope test.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on issue #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on issue #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#issuecomment-608974968
 
 
   This pull request **introduces 1 alert** when merging 258bfe476f5377789faec9bbf6ab9e1869110f99 into b5419962f0995daf0d4361a137f5f6c123fd9aa0 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-f1d002928db975e030e78d0311f31d65f952e8ae)
   
   **new alerts:**
   
   * 1 for Dereferenced variable may be null

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r403436420
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
 ##########
 @@ -1081,6 +979,117 @@ private boolean cancelWorkerCleanup(String workerHost)
     }
   }
 
+  @VisibleForTesting
+  PathChildrenCacheListener getStatusListener(final Worker worker, final ZkWorker zkWorker, final SettableFuture<ZkWorker> retVal)
+  {
+    return (client, event) -> {
+      final String taskId;
+      final RemoteTaskRunnerWorkItem taskRunnerWorkItem;
+      synchronized (statusLock) {
+        try {
+          switch (event.getType()) { // lgtm [java/dereferenced-value-may-be-null]
+            case CHILD_ADDED:
+            case CHILD_UPDATED:
+              taskId = ZKPaths.getNodeFromPath(event.getData().getPath()); // lgtm [java/dereferenced-value-may-be-null]
 
 Review comment:
   Yes. event.getData() can be null here and everywhere else in this function. I dont think we need to do anything differently in handling the null event so I just let NPE happens and the catch will take care of logging and making sure the JVM doesnt shutdown. You can deduce that the NPE from event.getData() happens from the data map of the alert

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r404586950
 
 

 ##########
 File path: extensions-contrib/ambari-metrics-emitter/pom.xml
 ##########
 @@ -48,6 +48,18 @@
       <type>test-jar</type>
       <scope>test</scope>
     </dependency>
+    <dependency>
 
 Review comment:
   No. Other modules such as druid-orc-extensions and druid-hdfs-storage fails. This is because jackson-mapper-asl-1.9.13.jar is pulled in by org.apache.hadoop:hadoop-common:jar:2.8.5

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r403433127
 
 

 ##########
 File path: extensions-contrib/ambari-metrics-emitter/pom.xml
 ##########
 @@ -48,6 +48,18 @@
       <type>test-jar</type>
       <scope>test</scope>
     </dependency>
+    <dependency>
 
 Review comment:
   what is this change about, was this relying on curator providing it?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r404399519
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
 ##########
 @@ -1081,6 +979,117 @@ private boolean cancelWorkerCleanup(String workerHost)
     }
   }
 
+  @VisibleForTesting
+  PathChildrenCacheListener getStatusListener(final Worker worker, final ZkWorker zkWorker, final SettableFuture<ZkWorker> retVal)
+  {
+    return (client, event) -> {
+      final String taskId;
+      final RemoteTaskRunnerWorkItem taskRunnerWorkItem;
+      synchronized (statusLock) {
+        try {
+          switch (event.getType()) { // lgtm [java/dereferenced-value-may-be-null]
+            case CHILD_ADDED:
+            case CHILD_UPDATED:
+              taskId = ZKPaths.getNodeFromPath(event.getData().getPath()); // lgtm [java/dereferenced-value-may-be-null]
 
 Review comment:
   >You can deduce that the NPE from event.getData() happens from the data map of the alert and the Exception that was produced
   
   Ah yeah, I know _I can deduce this_, my point was more to be useful for operators who see an error or alert and don't want to dig through the code, which is probably most of them, that we could indicate friendlier messaging that an unexpected situation where the child data was null occurred on a zk event.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on issue #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on issue #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#issuecomment-608961801
 
 
   This pull request **introduces 1 alert** when merging 868176850bab929a43a7e1f3a8de4c48047afda4 into b5419962f0995daf0d4361a137f5f6c123fd9aa0 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-587335d2ba7e7d77c40221de2683a7c35983086c)
   
   **new alerts:**
   
   * 1 for Dereferenced variable may be null

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r404593305
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
 ##########
 @@ -1081,6 +979,117 @@ private boolean cancelWorkerCleanup(String workerHost)
     }
   }
 
+  @VisibleForTesting
+  PathChildrenCacheListener getStatusListener(final Worker worker, final ZkWorker zkWorker, final SettableFuture<ZkWorker> retVal)
+  {
+    return (client, event) -> {
+      final String taskId;
+      final RemoteTaskRunnerWorkItem taskRunnerWorkItem;
+      synchronized (statusLock) {
+        try {
+          switch (event.getType()) { // lgtm [java/dereferenced-value-may-be-null]
+            case CHILD_ADDED:
+            case CHILD_UPDATED:
+              taskId = ZKPaths.getNodeFromPath(event.getData().getPath()); // lgtm [java/dereferenced-value-may-be-null]
 
 Review comment:
   - Confirmed that event cannot be null. Removed null check for the event.
   - Added a null check blocks with different alert message than the existing message for catching any Exception. This should be more helpful to operators. Let me know if you have suggestion on different wording or other KV data to add to alert. This check is for the CHILD_ADDED, CHILD_UPDATED, and CHILD_REMOVED events which even.getData() must not be null 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r404593305
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
 ##########
 @@ -1081,6 +979,117 @@ private boolean cancelWorkerCleanup(String workerHost)
     }
   }
 
+  @VisibleForTesting
+  PathChildrenCacheListener getStatusListener(final Worker worker, final ZkWorker zkWorker, final SettableFuture<ZkWorker> retVal)
+  {
+    return (client, event) -> {
+      final String taskId;
+      final RemoteTaskRunnerWorkItem taskRunnerWorkItem;
+      synchronized (statusLock) {
+        try {
+          switch (event.getType()) { // lgtm [java/dereferenced-value-may-be-null]
+            case CHILD_ADDED:
+            case CHILD_UPDATED:
+              taskId = ZKPaths.getNodeFromPath(event.getData().getPath()); // lgtm [java/dereferenced-value-may-be-null]
 
 Review comment:
   - Confirmed that event cannot be null. Removed null check for the event.
   - Added a separate catch block for NPE with a different alert message than the existing message for catching any Exception. This should be more helpful to operators. Let me know if you have suggestion on different wording or other KV data to add to alert.
    

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ccaominh commented on issue #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
ccaominh commented on issue #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#issuecomment-608554699
 
 
   PR description says curator 4.3.0, but changes upgraded to 4.2.0?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r403433584
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
 ##########
 @@ -1081,6 +979,117 @@ private boolean cancelWorkerCleanup(String workerHost)
     }
   }
 
+  @VisibleForTesting
+  PathChildrenCacheListener getStatusListener(final Worker worker, final ZkWorker zkWorker, final SettableFuture<ZkWorker> retVal)
+  {
+    return (client, event) -> {
+      final String taskId;
+      final RemoteTaskRunnerWorkItem taskRunnerWorkItem;
+      synchronized (statusLock) {
+        try {
+          switch (event.getType()) { // lgtm [java/dereferenced-value-may-be-null]
+            case CHILD_ADDED:
+            case CHILD_UPDATED:
+              taskId = ZKPaths.getNodeFromPath(event.getData().getPath()); // lgtm [java/dereferenced-value-may-be-null]
 
 Review comment:
   Per the PR description/linked javadocs, can this and other usages of `event.getData()` in this switch statement be null? I guess we are relying on an NPE happening and then the catch checking and handling there? It might be nicer to explicitly check and throw a more useful error about the `ChildData` for the event unexpectedly being null so that the alerted exception is more useful

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r403436115
 
 

 ##########
 File path: extensions-contrib/ambari-metrics-emitter/pom.xml
 ##########
 @@ -48,6 +48,18 @@
       <type>test-jar</type>
       <scope>test</scope>
     </dependency>
+    <dependency>
 
 Review comment:
   Yes. curator 4.1.0 was providing both org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13 and org.codehaus.jackson:jackson-core-asl:jar:1.9.13. Curator 4.3.0 no longer provides those which then causes java.lang.NoClassDefFoundError: org/codehaus/jackson/map/AnnotationIntrospector. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r404593305
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
 ##########
 @@ -1081,6 +979,117 @@ private boolean cancelWorkerCleanup(String workerHost)
     }
   }
 
+  @VisibleForTesting
+  PathChildrenCacheListener getStatusListener(final Worker worker, final ZkWorker zkWorker, final SettableFuture<ZkWorker> retVal)
+  {
+    return (client, event) -> {
+      final String taskId;
+      final RemoteTaskRunnerWorkItem taskRunnerWorkItem;
+      synchronized (statusLock) {
+        try {
+          switch (event.getType()) { // lgtm [java/dereferenced-value-may-be-null]
+            case CHILD_ADDED:
+            case CHILD_UPDATED:
+              taskId = ZKPaths.getNodeFromPath(event.getData().getPath()); // lgtm [java/dereferenced-value-may-be-null]
 
 Review comment:
   - Confirmed that event cannot be null. Removed null check for the event.
   - Added a null check blocks for event.getData() with different alert message than the existing message for catching any Exception. This should be more helpful to operators. Let me know if you have suggestion on different wording or other KV data to add to alert. This check is for the CHILD_ADDED, CHILD_UPDATED, and CHILD_REMOVED events which even.getData() must not be null.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r404594626
 
 

 ##########
 File path: extensions-contrib/ambari-metrics-emitter/pom.xml
 ##########
 @@ -48,6 +48,18 @@
       <type>test-jar</type>
       <scope>test</scope>
     </dependency>
+    <dependency>
+      <groupId>org.codehaus.jackson</groupId>
+      <artifactId>jackson-core-asl</artifactId>
+      <version>${codehaus.jackson.version}</version>
+      <scope>test</scope>
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#discussion_r404395190
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
 ##########
 @@ -1081,6 +979,117 @@ private boolean cancelWorkerCleanup(String workerHost)
     }
   }
 
+  @VisibleForTesting
+  PathChildrenCacheListener getStatusListener(final Worker worker, final ZkWorker zkWorker, final SettableFuture<ZkWorker> retVal)
+  {
+    return (client, event) -> {
+      final String taskId;
+      final RemoteTaskRunnerWorkItem taskRunnerWorkItem;
+      synchronized (statusLock) {
+        try {
+          switch (event.getType()) { // lgtm [java/dereferenced-value-may-be-null]
+            case CHILD_ADDED:
+            case CHILD_UPDATED:
+              taskId = ZKPaths.getNodeFromPath(event.getData().getPath()); // lgtm [java/dereferenced-value-may-be-null]
 
 Review comment:
   > Yes. event.getData() can be null here and everywhere else in this function. I dont think we need to do anything differently in handling the null from event.getData() so I just let NPE happens and the catch will take care of logging and making sure the JVM doesnt shutdown.
   
   `even.getData()` can be null depending on the event type; it must not be null for `CHILD_ADDED`, `CHILD_UPDATED`, and `CHILD_REMOVED` events. It should be null otherwise. I think it's worth checking null depending on the event type.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on issue #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

Posted by GitBox <gi...@apache.org>.
maytasm commented on issue #9610: Fix NPE in RemoteTaskRunner event handler causes JVM shutdown
URL: https://github.com/apache/druid/pull/9610#issuecomment-608645153
 
 
   > PR description says curator 4.3.0, but changes upgraded to 4.2.0?
   
   Good catch. It is suppose to be 4.3.0. Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org