You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/08/13 00:33:25 UTC

[GitHub] [druid] humit0 opened a new issue #11589: Cannot retrieve proper tasks list from Overlord Task API with createdTimeInterval paramater

humit0 opened a new issue #11589:
URL: https://github.com/apache/druid/issues/11589


   ### Affected Version
   
   0.21.1
   
   ### Description
   
   (docs)[https://druid.apache.org/docs/0.21.1/operations/api-reference.html#get-14]
   
   From the documentation, `/druid/indexer/v1/tasks` API accept `createdTimeInterval` parameter.
   
   If I call API without `createdTimeInterval` parameter, I can retrieve 1 task which createdTime is `2021-08-12T23:51:23.151Z`.
   ```sh
   curl -X http://{OVERLORD_HOST}:8090/druid/indexer/v1/tasks?datasource=new-data-source
   ```
   ```json
   [{"id":"index_parallel_new-data-source_mfchdpon_2021-08-12T23:51:23.142Z","groupId":"index_parallel_new-data-source_mfchdpon_2021-08-12T23:51:23.142Z","type":"index_parallel","createdTime":"2021-08-12T23:51:23.151Z","queueInsertionTime":"1970-01-01T00:00:00.000Z","statusCode":"SUCCESS","status":"SUCCESS","runnerStatusCode":"NONE","duration":25546,"location":{"host":"{MIDDLE_MANAGER_HOST}","port":8100,"tlsPort":-1},"dataSource":"new-data-source","errorMsg":null}]
   ```
   
   The createdTime of this task was `2021-08-12T23:51:23.151Z`, so `createdTimeInterval` "2021-08-12T23:50:00.000Z_2021-08-13T00:00:00.000Z" should contain this task. But I retrieve empty task.
   ```sh
   curl -X http://{OVERLORD_HOST}:8090/druid/indexer/v1/tasks?datasource=new-data-source&createdTimeInterval=2021-08-12T23:50:00.000Z_2021-08-13T00:00:00.000Z
   ```
   ```json
   []
   ```
   
   But when I specify `createdTimeInterval` "2021-01-01T00:00:00.000Z_2021-01-02T00:00:10.000Z", I can retrieve 1 task which createdTime was `2021-08-12T23:51:23.151Z`
   ```sh
   curl -X http://{OVERLORD_HOST}:8090/druid/indexer/v1/tasks?datasource=new-data-source&createdTimeInterval=2021-01-01T00:00:00.000Z_2021-01-02T00:00:10.000Z
   ```
   ```json
   [{"id":"index_parallel_new-data-source_mfchdpon_2021-08-12T23:51:23.142Z","groupId":"index_parallel_new-data-source_mfchdpon_2021-08-12T23:51:23.142Z","type":"index_parallel","createdTime":"2021-08-12T23:51:23.151Z","queueInsertionTime":"1970-01-01T00:00:00.000Z","statusCode":"SUCCESS","status":"SUCCESS","runnerStatusCode":"NONE","duration":25546,"location":{"host":"{MIDDLE_MANAGER_HOST}","port":8100,"tlsPort":-1},"dataSource":"new-data-source","errorMsg":null}]
   ```
   
   #### Search from code
   When call task API with `createdTimeInterval` parameter, below code execute.
   It calculate time duration from time interval.
   https://github.com/apache/druid/blob/druid-0.21.1/indexing-service/src/main/java/org/apache/druid/indexing/overlord/http/OverlordResource.java#L616
   ```java
         Duration createdTimeDuration = null;
         if (createdTimeInterval != null) {
           final Interval theInterval = Intervals.of(StringUtils.replace(createdTimeInterval, "_", "/"));
           createdTimeDuration = theInterval.toDuration();
         }
         final List<TaskInfo<Task, TaskStatus>> taskInfoList =
             taskStorageQueryAdapter.getCompletedTaskInfoByCreatedTimeDuration(maxCompletedTasks, createdTimeDuration, dataSource);
   ```
   
   And `getCompletedTaskInfoByCreatedTimeDuration` call `getRecentlyCreatedAlreadyFinishedTaskInfo` method.
   
   https://github.com/apache/druid/blob/druid-0.21.1/indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskStorageQueryAdapter.java#L61
   ```java
     public List<TaskInfo<Task, TaskStatus>> getCompletedTaskInfoByCreatedTimeDuration(
         @Nullable Integer maxTaskStatuses,
         @Nullable Duration duration,
         @Nullable String dataSource
     )
     {
       return storage.getRecentlyCreatedAlreadyFinishedTaskInfo(maxTaskStatuses, duration, dataSource);
     }
   ```
   
   `getRecentlyCreatedAlreadyFinishedTaskInfo` method is coping completed task list which createdTime is (now - duration) ~ (now).
   https://github.com/apache/druid/blob/druid-0.21.1/indexing-service/src/main/java/org/apache/druid/indexing/overlord/MetadataTaskStorage.java#L223
   
   ```java
     @Override
     public List<TaskInfo<Task, TaskStatus>> getRecentlyCreatedAlreadyFinishedTaskInfo(
         @Nullable Integer maxTaskStatuses,
         @Nullable Duration durationBeforeNow,
         @Nullable String datasource
     )
     {
       return ImmutableList.copyOf(
           handler.getCompletedTaskInfo(
               DateTimes.nowUtc()
                        .minus(durationBeforeNow == null ? config.getRecentlyFinishedThreshold() : durationBeforeNow),
               maxTaskStatuses,
               datasource
           )
       );
     }
   ```
   
   So I think create `getCompletedTaskInfoByCreatedTimeInterval` method from `TaskStorageQueryAdapter` class which arguments are `maxTaskStatuses`, `interval`, and `dataSource`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on issue #11589: Cannot retrieve proper tasks list from Overlord Task API with createdTimeInterval paramater

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on issue #11589:
URL: https://github.com/apache/druid/issues/11589#issuecomment-898858176


   The original PR(#5801) that introduced the parameter 'createdTimeInterval' took the interval parameter as duration. 
   
   From the user's side, I think 'interval' is the right semantic rather than duration because that would help us find out right tasks in given time range. 
   
   cc @jihoonson 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org