You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/10/12 01:29:14 UTC

[GitHub] [druid] paul-rogers commented on a diff in pull request #13131: Update clean-metadata-store.md

paul-rogers commented on code in PR #13131:
URL: https://github.com/apache/druid/pull/13131#discussion_r992900167


##########
docs/operations/clean-metadata-store.md:
##########
@@ -143,7 +143,7 @@ Datasource cleanup uses the following configuration:
 
 ### Indexer task logs
 
-You can configure the Overlord to delete indexer task log metadata and the indexer task logs from local disk or from cloud storage.
+You can configure the Overlord to delete indexer task log metadata and the indexer task logs from local disk or from cloud storage.  The cleanup includes the `druid_tasks` and `druid_tasklogs` tables in the metadata database, and the task logs in deep storage.  (Note that `druid_tasklogs` is no longer used and will already be empty, unless the druid version is older.) 

Review Comment:
   Is this cleanup absolute? That is, it deletes everything? Or, only items older than some expiration age? (If everything, then there is a feature request to have that cutoff.)
   
   When configuring, does the cleanup occur on a schedule? After every task?
   
   I'd be very surprised if the cleanup actually "removes the `druid_tasks` and `druid_tasklogs` tables". This seems extreme, and introduces race conditions. Does it actually "drop all records in the `druid_tasks` and `druid_tasklogs` tables"?
   
   If we do the brute-force, drop all info approach, then suggestion:
   
   > You can configure the Overlord to delete information for all indexer tasks which have either completed or failed. During cleanup, the Overlord drops all records from the `druid_tasks` and `druid_tasklogs` tables in the metadata database. Overlord also removes all task logs from deep storage.
   
   What it would be great if we could say:
   
   > You can configure the Overlord to periodically expire (delete) indexer task information. Overlord will delete tasks that have either completed or failed if those tasks are older than the expiration period. During cleanup, the Overlord drops expired records from the `druid_tasks` and `druid_tasklogs` tables in the metadata database. Overlord also  removes expired task logs from deep storage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org