You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/01/17 22:35:10 UTC

[GitHub] [druid] lrosenman opened a new issue #9212: azure extensions: task log kill crashes with "Not Implemented".

lrosenman opened a new issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212
 
 
   Please provide a detailed title (e.g. "Broker crashes when using TopN query with Bound filter" instead of just "Broker crashes").
   
   ### Affected Version
   0.16.0-incubating
   
   ### Description
   I have a small 3-master, 2-query, 2-storage node cluster and need to be able to shrink the druid_tasks table.  
   
   I added the following to my overlord configuration:
   `   # remove old logs
       druid.indexer.logs.kill.enabled=true
       # after: (in ms).
       # 14 days * 24 hours * 60 minutes * 60 seconds * 1000 ms
       druid.indexer.logs.kill.durationToRetain=1209600000`
   
   and when the kill task ran, I got:
   
   > 2020-01-17T22:04:38,578 ERROR [Overlord-Helper-Manager-Exec--0] org.apache.druid.indexing.overlord.helpers.TaskLogAutoCleaner - Failed to clean-up the task logs
   java.lang.UnsupportedOperationException: not implemented
           at org.apache.druid.storage.azure.AzureTaskLogs.killOlderThan(AzureTaskLogs.java:159) ~[?:?]
           at org.apache.druid.indexing.overlord.helpers.TaskLogAutoCleaner$1.run(TaskLogAutoCleaner.java:75) [druid-indexing-service-0.16.0-incubating.jar:0.16.0-incubating]
           at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$1.call(ScheduledExecutors.java:55) [druid-core-0.16.0-incubating.jar:0.16.0-incubating]
           at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$1.call(ScheduledExecutors.java:51) [druid-core-0.16.0-incubating.jar:0.16.0-incubating]
           at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$2.run(ScheduledExecutors.java:92) [druid-core-0.16.0-incubating.jar:0.16.0-incubating]
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_232]
           at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_232]
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_232]
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_232]
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]
           at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
   
   Is there a plan to implement this?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lrosenman edited a comment on issue #9212: azure extensions: task log kill crashes with "Not Implemented".

Posted by GitBox <gi...@apache.org>.
lrosenman edited a comment on issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212#issuecomment-575830872
 
 
   Looking at s3, google,  and azure, NONE of them inplement
   killAll()
   killOlderThan(long timestamp)
   
   the only implementation of those two functions is in hdfs. 
   
   so, how is one supposed to be able to shrink the size of the druid_tasks metadata table, and reduce storage usage in the cloud based tasklogs storage?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212#issuecomment-590613679
 
 
   Yeah its schema is not very pretty. I would say you will be safe as long as you delete only the rows of inactive tasks that have the `created_date` older than (`current_timestamp()` - `druid.indexer.storage.recentlyFinishedThreshold`). The actual SQL used in Druid is [`DELETE FROM %s WHERE created_date < :date_time AND active = false`](https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/metadata/SQLMetadataStorageActionHandler.java#L468).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] bharadwajrembar commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".

Posted by GitBox <gi...@apache.org>.
bharadwajrembar commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212#issuecomment-606670244
 
 
   We are hit with the same for S3 on version `0.17.0`. @jihoonson We use the Router for our UI. When we go into the `Ingestion` tab, the API fires this SQL query:
   
   ```
   SELECT  "task_id", "group_id", "type", "datasource", "created_time", "location", "duration", "error_msg",  
   CASE WHEN "status" = 'RUNNING' THEN "runner_status" ELSE "status" END AS "status", 
    (    CASE WHEN "status" = 'RUNNING' THEN     (CASE "runner_status" WHEN 'RUNNING' THEN 4 WHEN 'PENDING' THEN 3 ELSE 2 END)    ELSE 1    END  ) 
    AS "rank"FROM sys.tasks ORDER BY "rank" DESC, "created_time" DESC
   ```
   and it takes sometime to load.
   
   Does this query go to the Overlord? If so, is having a large number of old tasks in the metadata likely to affect performance or even potentially cause OOMs in the Overlord?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lrosenman commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".

Posted by GitBox <gi...@apache.org>.
lrosenman commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212#issuecomment-575830872
 
 
   Looking at s3, google, hdfs, and azure, NONE of them inplement
   killAll()
   killOlderThan(long timestamp)
   
   so, how is one supposed to be able to shrink the size of the druid_tasks metadata table, and reduce storage usage in the cloud based tasklogs storage?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lrosenman edited a comment on issue #9212: azure extensions: task log kill crashes with "Not Implemented".

Posted by GitBox <gi...@apache.org>.
lrosenman edited a comment on issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212#issuecomment-590616757
 
 
   druid=# select count(*) from druid_tasks where created_date::timestamptz < now() - '30 days'::interval and not active;
    count
   --------
    949649
   (1 row)
   
   If I change the select count(*) to delete, is this safe?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212#issuecomment-606755674
 
 
   @bharadwajrembar, yes the SQL internally calls an Overlord API which needs to materialize old tasks in memory which can cause OOM if you have a huge number of tasks. The OOM doesn't usually happen because it doesn't return tasks older than `druid.indexer.storage.recentlyFinishedThreshold`.
   
   BTW, this issue was fixed in https://github.com/apache/druid/pull/9523 which will be included in 0.18.0. I'm closing this issue for now. Please reopen it if you think this issue still exists.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212#issuecomment-590587873
 
 
   @lrosenman thank you for the report. As you mentioned, those methods are not implemented properly for all deep storage types. I think this is a bug, but AFAIK, there is no workaround except for having an external cron job that periodically cleans up the table.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lrosenman edited a comment on issue #9212: azure extensions: task log kill crashes with "Not Implemented".

Posted by GitBox <gi...@apache.org>.
lrosenman edited a comment on issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212#issuecomment-590616757
 
 
   ```
   druid=# select count(*) from druid_tasks where created_date::timestamptz < now() - '30 days'::interval and not active;
    count
   --------
    949649
   (1 row)
   ```
   
   If I change the select count(*) to delete, is this safe?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lrosenman commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".

Posted by GitBox <gi...@apache.org>.
lrosenman commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212#issuecomment-575826110
 
 
   I'd also love (in the interim), a supported way to shrink the druid_tasks table that currently has >900,000 rows and takes 2.2G of storage, and gets locked sometimes while my backups are running killing the backup. 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lrosenman edited a comment on issue #9212: azure extensions: task log kill crashes with "Not Implemented".

Posted by GitBox <gi...@apache.org>.
lrosenman edited a comment on issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212#issuecomment-575826110
 
 
   I'd also love (in the interim), a supported way to shrink the druid_tasks table that currently has >900,000 rows and takes 2.2G of storage, and gets locked (on the PostgreSQL slave I use for backups) sometimes while my backups are running killing the backup. 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lrosenman edited a comment on issue #9212: azure extensions: task log kill crashes with "Not Implemented".

Posted by GitBox <gi...@apache.org>.
lrosenman edited a comment on issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212#issuecomment-575830872
 
 
   Looking at s3, google,  and azure, NONE of them implement
   killAll()
   killOlderThan(long timestamp)
   
   the only implementation of those two functions is in hdfs. 
   
   so, how is one supposed to be able to shrink the size of the druid_tasks metadata table, and reduce storage usage in the cloud based tasklogs storage?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lrosenman commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".

Posted by GitBox <gi...@apache.org>.
lrosenman commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212#issuecomment-590602925
 
 
   how safe is deleting rows from the table?  And given that the datestamps are varchars and not real timestamps, I'm loathe to try it via SQL.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212#issuecomment-590618171
 
 
   Yes. It should be safe.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson closed issue #9212: azure extensions: task log kill crashes with "Not Implemented".

Posted by GitBox <gi...@apache.org>.
jihoonson closed issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lrosenman commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".

Posted by GitBox <gi...@apache.org>.
lrosenman commented on issue #9212: azure extensions: task log kill crashes with "Not Implemented".
URL: https://github.com/apache/druid/issues/9212#issuecomment-590616757
 
 
   `druid=# select count(*) from druid_tasks where created_date::timestamptz < now() - '30 days'::interval and not active;
    count
   --------
    949649
   (1 row)`
   
   If I change the select count(*) to delete, is this safe?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org