You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/04/28 00:08:23 UTC

[GitHub] [druid] maytasm opened a new issue #11173: Auto cleanup of metadata tables to enable high frequency creation/deletion of data sources

maytasm opened a new issue #11173:
URL: https://github.com/apache/druid/issues/11173

### Motivation

We want to be able to control the size of Druid's metadata store in the face of high frequency creation/deletion of datasources and it’s metadata (such as load/drop rules, auto compaction config, etc). These datasources entities can be very short lived as they are being created and deleted in a short span of time resulting in a high churn of datasource. We have to make sure that performance of Druid does not degrade as the number of datasource churned (created and deleted) increases. Currently, metadata store size increases as the number of datasource being created increases, even after those datasource are deleted.

### Proposed changes

Druid currently only supports druid.indexer.logs.kill.* to clean up task logs in the deep storage and entries in druid_tasks. We should add similar capability for other metadata tables that grows with creation/deletion of data sources. These metadata tables include druid_audit, druid_supervisors, compaction config in druid_config, druid_rules, and druid_datasources. These auto cleanup will run as part of Coordinator Duty. Each individual table cleanup and be enabled/disabled and configured separately depending on user needs. Each of the table cleanup will have the following properties:
`druid.coordinator.kill.*.on` - For enabling/disabling the cleanup for a particular metadata table
`druid.coordinator.kill.*.period` - For configuring how often to run the cleanup duty
`druid.coordinator.kill.*.durationToRetain` - For configuring duration of entity to be retained from created time

Note that all of these configurations already exist for the task log cleanup.

### Rationale

This features/fixes would allow customer to have high churn of various entities in Druid (such as datasource, rules, configs, etc.) out of the box. Customer will not run into unexpected poor cluster performance if they have high churn of various entities (since we do not have any guardrails to prevent high churn from occurring). This would provide better experience for customer using Druid and eliminates any potential support ticket related to high churn entities.

The rationale for the solution is to keep the UI/UX similar to the existing implementation of druid.indexer.logs.kill.* to clean up task logs. This is to make the operation and usage of the new features coherent with the existing clean up of the task logs for ease of use.

### Operational impact

There will not be anything deprecated or removed by this change and hence no migration path that cluster operators need to be aware of. All of the features introduced will be behind feature flags and hence, by default, will not impact rolling upgrade and rolling downgrade. With these changes, a Druid operator knowing that they have high frequency creation/deletion of data sources can set some cluster properties to enable this change. Once this is enabled, user will not have to worry about managing any external system / script / job to clean up the metadata store. User will be able to have high churn of various entities in Druid (such as datasource, rules, configs, etc.) without having to do any maintain (everything will automatically be managed in Druid).

### Test plan (optional)

A test is put together to go through the scenario of running through a loop of datasource/auto-compact/post supervisor/post 3 rules/delete supervisor/delete auto-compact/delete datasource (all with different datasource names). While running this, the test also monitors the druid metadata tables in derby, sys tables (probably the same as the md tables), and also zk metrics. The test will be run with 500 iterations on a brand-new local cluster for about 13 hours, we can then verify the growth of metadata table sizes.
Some additional tests we can also do include:
- Rolling churn with many datasources (i.e. have 200 datasources at all time while churning through one at a time)
- Churn multiple at a time (i.e. have 200 datasources at all time while churning multiple at a time)
- Churn of rules entities (this is missing from earlier test)
- Churn datasource without delete supervisor
- Churn datasource without delete compaction rule
- Churn supervisor/compaction/rule without delete datasource
- Churn datasource without delete rules

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] jon-wei commented on issue #11173: Auto cleanup of metadata tables to enable high frequency creation/deletion of data sources

Posted by GitBox <gi...@apache.org>.

jon-wei commented on issue #11173:
URL: https://github.com/apache/druid/issues/11173#issuecomment-831607417


   👍 , I think this proposal makes sense


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] maytasm closed issue #11173: Auto cleanup of metadata tables to enable high frequency creation/deletion of data sources

Posted by GitBox <gi...@apache.org>.

maytasm closed issue #11173:
URL: https://github.com/apache/druid/issues/11173


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on issue #11173: Auto cleanup of metadata tables to enable high frequency creation/deletion of data sources

Posted by GitBox <gi...@apache.org>.

suneet-s commented on issue #11173:
URL: https://github.com/apache/druid/issues/11173#issuecomment-828912883


   +1 on the approach and the config names. I'm also in favor of more granular controls per table instead of an overall config that applies to all the tables as the growth rate per table can be very different depending on the use case. This approach will allow a cluster admin to decide on the best config for their use case.
   
   When implementing this, we should verify that Druid has the correct indexes for each config so that deleting these entities does not become a bottleneck


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] maytasm commented on issue #11173: Auto cleanup of metadata tables to enable high frequency creation/deletion of data sources

Posted by GitBox <gi...@apache.org>.

maytasm commented on issue #11173:
URL: https://github.com/apache/druid/issues/11173#issuecomment-841008818


   All the implementations for this proposal are done and merged.
   The PRs:
   https://github.com/apache/druid/pull/11232
   https://github.com/apache/druid/pull/11227
   https://github.com/apache/druid/pull/11200
   https://github.com/apache/druid/pull/11164
   https://github.com/apache/druid/pull/11084
   https://github.com/apache/druid/pull/11078


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org