You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/13 22:26:36 UTC

[GitHub] [druid] 2bethere commented on a change in pull request #11245: add docs for high-churn datasource cleanup

2bethere commented on a change in pull request #11245:
URL: https://github.com/apache/druid/pull/11245#discussion_r631446053



##########
File path: docs/operations/clean-metadata-store.md
##########
@@ -0,0 +1,104 @@
+---
+id: clean-metadata-store
+title: "Automated cleanup for metadata records related to deleted datasources"
+sidebar_label: Automated metadata store cleanup
+description: "Defines a strategy to maintain Druid metadata store performance by automatically removing leftover records for deleted datasources. Most applicable to databases with 'high-churn' datasources."
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+When you delete a datasource from Apache Druid, some records related to the datasource may remain in the metadata store including:
+
+- audit records
+- supervisor records
+- rule records
+- compaction configuration records
+- datasource records created by supervisors
+
+If you have a high datasource churn rate, meaning you frequently create and delete many short-lived datasources, the leftover records can start to fill your metadata store and cause performance issues. To maintain metadata store performance in this case, you can configure Apache Druid to automatically remove records associated with deleted datasources from the metadata store.
+
+## Automated cleanup strategies
+There are several cases when you should consider automated cleanup of the metadata related to deleted datasources:
+- Proactively, if you know you have many high-churn datasources. For example you have scripts that create and delete supervisors regularly.
+- If you have issues with the hard disk for your metadata database filling up.
+- If you run into performance issues with the metadata database. For example API calls are very slow or fail to execute.
+
+Do not use the metadata store automated cleanup features if you have requirements to retain metadata records. For example, you have compliance requirements to keep audit records. In these cases, you should come up with an alternate method to preserve the audit metadata while freeing up your active metadata store.
+
+## Configure automated metadata cleanup
+Automated cleanup only removes records for deleted datasources. You can configure cleanup on a per-table basis as follows:
+ - `druid.coordinator.kill.*.on` enables cleanup for a particular metadata table.
+ - `druid.coordinator.kill.*.period` defines the frequency in [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601#Durations) for the cleanup job to check for and delete eligible records.
+ - `druid.coordinator.kill.*.durationToRetain` defines the retention period in [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601#Durations) after creation that the entity becomes eligible for deletion. For example to retain records for 30 days from their creation date: `P30D`.

Review comment:
       I think you should clarify if this is inclusive (<30 or <=30)

##########
File path: docs/operations/clean-metadata-store.md
##########
@@ -0,0 +1,104 @@
+---
+id: clean-metadata-store
+title: "Automated cleanup for metadata records related to deleted datasources"
+sidebar_label: Automated metadata store cleanup
+description: "Defines a strategy to maintain Druid metadata store performance by automatically removing leftover records for deleted datasources. Most applicable to databases with 'high-churn' datasources."
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+When you delete a datasource from Apache Druid, some records related to the datasource may remain in the metadata store including:
+
+- audit records
+- supervisor records
+- rule records
+- compaction configuration records
+- datasource records created by supervisors
+
+If you have a high datasource churn rate, meaning you frequently create and delete many short-lived datasources, the leftover records can start to fill your metadata store and cause performance issues. To maintain metadata store performance in this case, you can configure Apache Druid to automatically remove records associated with deleted datasources from the metadata store.
+
+## Automated cleanup strategies
+There are several cases when you should consider automated cleanup of the metadata related to deleted datasources:
+- Proactively, if you know you have many high-churn datasources. For example you have scripts that create and delete supervisors regularly.
+- If you have issues with the hard disk for your metadata database filling up.
+- If you run into performance issues with the metadata database. For example API calls are very slow or fail to execute.
+
+Do not use the metadata store automated cleanup features if you have requirements to retain metadata records. For example, you have compliance requirements to keep audit records. In these cases, you should come up with an alternate method to preserve the audit metadata while freeing up your active metadata store.

Review comment:
       ```suggestion
   If you have compliance requirements to keep audit records you should use alternative methods to preserve the audit metadata if you decide to enable metadata store automated cleanup. For example, export metadata store audit records periodically to external storage.
   ```

##########
File path: docs/operations/clean-metadata-store.md
##########
@@ -0,0 +1,104 @@
+---
+id: clean-metadata-store
+title: "Automated cleanup for metadata records related to deleted datasources"
+sidebar_label: Automated metadata store cleanup
+description: "Defines a strategy to maintain Druid metadata store performance by automatically removing leftover records for deleted datasources. Most applicable to databases with 'high-churn' datasources."
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+When you delete a datasource from Apache Druid, some records related to the datasource may remain in the metadata store including:
+
+- audit records
+- supervisor records
+- rule records
+- compaction configuration records
+- datasource records created by supervisors
+
+If you have a high datasource churn rate, meaning you frequently create and delete many short-lived datasources, the leftover records can start to fill your metadata store and cause performance issues. To maintain metadata store performance in this case, you can configure Apache Druid to automatically remove records associated with deleted datasources from the metadata store.
+
+## Automated cleanup strategies
+There are several cases when you should consider automated cleanup of the metadata related to deleted datasources:
+- Proactively, if you know you have many high-churn datasources. For example you have scripts that create and delete supervisors regularly.
+- If you have issues with the hard disk for your metadata database filling up.
+- If you run into performance issues with the metadata database. For example API calls are very slow or fail to execute.
+
+Do not use the metadata store automated cleanup features if you have requirements to retain metadata records. For example, you have compliance requirements to keep audit records. In these cases, you should come up with an alternate method to preserve the audit metadata while freeing up your active metadata store.
+
+## Configure automated metadata cleanup
+Automated cleanup only removes records for deleted datasources. You can configure cleanup on a per-table basis as follows:
+ - `druid.coordinator.kill.*.on` enables cleanup for a particular metadata table.
+ - `druid.coordinator.kill.*.period` defines the frequency in [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601#Durations) for the cleanup job to check for and delete eligible records.
+ - `druid.coordinator.kill.*.durationToRetain` defines the retention period in [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601#Durations) after creation that the entity becomes eligible for deletion. For example to retain records for 30 days from their creation date: `P30D`.
+
+ For full configuration details, see [Metadata management](../configuration/index.md#metadata-management).
+
+The following are eligible for automatic cleanup:

Review comment:
       ```suggestion
   The following metadata store tables are eligible for automatic cleanup:
   ```

##########
File path: docs/operations/clean-metadata-store.md
##########
@@ -0,0 +1,104 @@
+---
+id: clean-metadata-store
+title: "Automated cleanup for metadata records related to deleted datasources"
+sidebar_label: Automated metadata store cleanup
+description: "Defines a strategy to maintain Druid metadata store performance by automatically removing leftover records for deleted datasources. Most applicable to databases with 'high-churn' datasources."
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+When you delete a datasource from Apache Druid, some records related to the datasource may remain in the metadata store including:
+
+- audit records
+- supervisor records
+- rule records
+- compaction configuration records
+- datasource records created by supervisors
+
+If you have a high datasource churn rate, meaning you frequently create and delete many short-lived datasources, the leftover records can start to fill your metadata store and cause performance issues. To maintain metadata store performance in this case, you can configure Apache Druid to automatically remove records associated with deleted datasources from the metadata store.
+
+## Automated cleanup strategies
+There are several cases when you should consider automated cleanup of the metadata related to deleted datasources:
+- Proactively, if you know you have many high-churn datasources. For example you have scripts that create and delete supervisors regularly.
+- If you have issues with the hard disk for your metadata database filling up.
+- If you run into performance issues with the metadata database. For example API calls are very slow or fail to execute.
+
+Do not use the metadata store automated cleanup features if you have requirements to retain metadata records. For example, you have compliance requirements to keep audit records. In these cases, you should come up with an alternate method to preserve the audit metadata while freeing up your active metadata store.
+
+## Configure automated metadata cleanup
+Automated cleanup only removes records for deleted datasources. You can configure cleanup on a per-table basis as follows:
+ - `druid.coordinator.kill.*.on` enables cleanup for a particular metadata table.
+ - `druid.coordinator.kill.*.period` defines the frequency in [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601#Durations) for the cleanup job to check for and delete eligible records.

Review comment:
       What's the default?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org