You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/04/08 22:02:11 UTC

[GitHub] [druid] maytasm opened a new pull request, #12415: Add docs to metric spec for auto compaction

maytasm opened a new pull request, #12415:
URL: https://github.com/apache/druid/pull/12415

   Add docs to metric spec for auto compaction
   
   ### Description
   
   Add docs to metric spec for auto compaction
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [x] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm merged pull request #12415: Add docs to metric spec for auto compaction

Posted by GitBox <gi...@apache.org>.
maytasm merged PR #12415:
URL: https://github.com/apache/druid/pull/12415


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vtlim commented on a diff in pull request #12415: Add docs to metric spec for auto compaction

Posted by GitBox <gi...@apache.org>.
vtlim commented on code in PR #12415:
URL: https://github.com/apache/druid/pull/12415#discussion_r848738936


##########
docs/configuration/index.md:
##########
@@ -970,6 +970,7 @@ A description of the compaction config is:
 |`granularitySpec`|Custom `granularitySpec`. See [Automatic compaction granularitySpec](#automatic-compaction-granularityspec)|No|
 |`dimensionsSpec`|Custom `dimensionsSpec`. See [Automatic compaction dimensionsSpec](#automatic-compaction-dimensions-spec)|No|
 |`transformSpec`|Custom `transformSpec`. See [Automatic compaction transformSpec](#automatic-compaction-transform-spec)|No|
+|`metricsSpec`|Custom `metricsSpec`. The compaction task uses the specified `metricsSpec` rather than generating one. Note that if `metricsSpec` is specified then the compaction task will preserve existing metrics of the original data. This means that for any row that already has metric (with the same name defined in metricSpec), the metric aggregator in metricSpec is skipped and the existing metric is preserved. If the row does not already have the metric, then the metric aggregator is applied on the source column as usual. Metrics is then combined and rollup across segments as usual.|No|

Review Comment:
   Left a suggestion with that detail included



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a diff in pull request #12415: Add docs to metric spec for auto compaction

Posted by GitBox <gi...@apache.org>.
maytasm commented on code in PR #12415:
URL: https://github.com/apache/druid/pull/12415#discussion_r847986676


##########
docs/configuration/index.md:
##########
@@ -970,6 +970,7 @@ A description of the compaction config is:
 |`granularitySpec`|Custom `granularitySpec`. See [Automatic compaction granularitySpec](#automatic-compaction-granularityspec)|No|
 |`dimensionsSpec`|Custom `dimensionsSpec`. See [Automatic compaction dimensionsSpec](#automatic-compaction-dimensions-spec)|No|
 |`transformSpec`|Custom `transformSpec`. See [Automatic compaction transformSpec](#automatic-compaction-transform-spec)|No|
+|`metricsSpec`|Custom `metricsSpec`. The compaction task uses the specified `metricsSpec` rather than generating one. Note that if `metricsSpec` is specified then the compaction task will preserve existing metrics of the original data. This means that for any row that already has metric (with the same name defined in metricSpec), the metric aggregator in metricSpec is skipped and the existing metric is preserved. If the row does not already have the metric, then the metric aggregator is applied on the source column as usual. Metrics is then combined and rollup across segments as usual.|No|

Review Comment:
   @vtlim thanks for the suggestion. I think it's great. One thing I am wondering if you can help add is indicating that when `metricsSpec` is not specified and Druid discovers metric that cannot be combine across the segments then the compaction task can fail. For example, if segment 1 has metric `foo` which is a LongSum and segment 2 has metric name `foo` which is a LongMin (it does not make sense how to combine sum of something and min of something). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vtlim commented on a diff in pull request #12415: Add docs to metric spec for auto compaction

Posted by GitBox <gi...@apache.org>.
vtlim commented on code in PR #12415:
URL: https://github.com/apache/druid/pull/12415#discussion_r848736156


##########
docs/configuration/index.md:
##########
@@ -970,6 +970,7 @@ A description of the compaction config is:
 |`granularitySpec`|Custom `granularitySpec`. See [Automatic compaction granularitySpec](#automatic-compaction-granularityspec)|No|
 |`dimensionsSpec`|Custom `dimensionsSpec`. See [Automatic compaction dimensionsSpec](#automatic-compaction-dimensions-spec)|No|
 |`transformSpec`|Custom `transformSpec`. See [Automatic compaction transformSpec](#automatic-compaction-transform-spec)|No|
+|`metricsSpec`|Custom [`metricsSpec`](../ingestion/ingestion-spec.md#metricsspec). The compaction task preserves any existing metrics regardless of whether `metricsSpec` is specified. If `metricsSpec` is specified, Druid does not reapply any aggregators matching the metric names specified in `metricsSpec` to rows that already have the associated metrics. For rows that do not already have the metric specified in `metricsSpec`, Druid applies the metric aggregator on the source column, then proceeds to combine the metrics across segments as usual. If `metricsSpec` is not specified, Druid automatically discovers the metrics in the existing and compacted segments and combines existing metrics with the same metric name across segments.|No|

Review Comment:
   ```suggestion
   |`metricsSpec`|Custom [`metricsSpec`](../ingestion/ingestion-spec.md#metricsspec). The compaction task preserves any existing metrics regardless of whether `metricsSpec` is specified. If `metricsSpec` is specified, Druid does not reapply any aggregators matching the metric names specified in `metricsSpec` to rows that already have the associated metrics. For rows that do not have the metric specified in `metricsSpec`, Druid applies the metric aggregator on the source column, then proceeds to combine the metrics across segments as usual. If `metricsSpec` is not specified, Druid automatically discovers the metrics in the existing and compacted segments and combines existing metrics with the same metric name across segments. Metrics of the same name are assumed to be from the same type of aggregator, otherwise the compaction task may fail.|No|
   ```
   Does Druid fail if it tries to combine similar aggregators, like `longSum` and `doubleSum`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a diff in pull request #12415: Add docs to metric spec for auto compaction

Posted by GitBox <gi...@apache.org>.
maytasm commented on code in PR #12415:
URL: https://github.com/apache/druid/pull/12415#discussion_r848745045


##########
docs/configuration/index.md:
##########
@@ -970,6 +970,7 @@ A description of the compaction config is:
 |`granularitySpec`|Custom `granularitySpec`. See [Automatic compaction granularitySpec](#automatic-compaction-granularityspec)|No|
 |`dimensionsSpec`|Custom `dimensionsSpec`. See [Automatic compaction dimensionsSpec](#automatic-compaction-dimensions-spec)|No|
 |`transformSpec`|Custom `transformSpec`. See [Automatic compaction transformSpec](#automatic-compaction-transform-spec)|No|
+|`metricsSpec`|Custom [`metricsSpec`](../ingestion/ingestion-spec.md#metricsspec). The compaction task preserves any existing metrics regardless of whether `metricsSpec` is specified. If `metricsSpec` is specified, Druid does not reapply any aggregators matching the metric names specified in `metricsSpec` to rows that already have the associated metrics. For rows that do not already have the metric specified in `metricsSpec`, Druid applies the metric aggregator on the source column, then proceeds to combine the metrics across segments as usual. If `metricsSpec` is not specified, Druid automatically discovers the metrics in the existing and compacted segments and combines existing metrics with the same metric name across segments.|No|

Review Comment:
   I like the rephrase more. Each aggregators has it's own logic on what it can combine with and cannot combine with. Most aggregators can only be combine with the same class as itself but that is not always the case. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vtlim commented on a diff in pull request #12415: Add docs to metric spec for auto compaction

Posted by GitBox <gi...@apache.org>.
vtlim commented on code in PR #12415:
URL: https://github.com/apache/druid/pull/12415#discussion_r847616165


##########
docs/configuration/index.md:
##########
@@ -970,6 +970,7 @@ A description of the compaction config is:
 |`granularitySpec`|Custom `granularitySpec`. See [Automatic compaction granularitySpec](#automatic-compaction-granularityspec)|No|
 |`dimensionsSpec`|Custom `dimensionsSpec`. See [Automatic compaction dimensionsSpec](#automatic-compaction-dimensions-spec)|No|
 |`transformSpec`|Custom `transformSpec`. See [Automatic compaction transformSpec](#automatic-compaction-transform-spec)|No|
+|`metricsSpec`|Custom `metricsSpec`. The compaction task uses the specified `metricsSpec` rather than generating one. Note that if `metricsSpec` is specified then the compaction task will preserve existing metrics of the original data. This means that for any row that already has metric (with the same name defined in metricSpec), the metric aggregator in metricSpec is skipped and the existing metric is preserved. If the row does not already have the metric, then the metric aggregator is applied on the source column as usual. Metrics is then combined and rollup across segments as usual.|No|

Review Comment:
   ```suggestion
   |`metricsSpec`|Custom [`metricsSpec`](../ingestion/ingestion-spec.md#metricsspec). The compaction task preserves any existing metrics regardless of whether `metricsSpec` is specified. If `metricsSpec` is specified, Druid does not reapply any aggregators matching the metric names specified in `metricsSpec` to rows that already have the associated metrics. For rows that do not already have the metric specified in `metricsSpec`, Druid applies the metric aggregator on the source column, then proceeds to combine the metrics across segments as usual. If `metricsSpec` is not specified, Druid automatically discovers the metrics in the existing and compacted segments and combines existing metrics with the same metric name across segments.|No|
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vtlim commented on a diff in pull request #12415: Add docs to metric spec for auto compaction

Posted by GitBox <gi...@apache.org>.
vtlim commented on code in PR #12415:
URL: https://github.com/apache/druid/pull/12415#discussion_r848736156


##########
docs/configuration/index.md:
##########
@@ -970,6 +970,7 @@ A description of the compaction config is:
 |`granularitySpec`|Custom `granularitySpec`. See [Automatic compaction granularitySpec](#automatic-compaction-granularityspec)|No|
 |`dimensionsSpec`|Custom `dimensionsSpec`. See [Automatic compaction dimensionsSpec](#automatic-compaction-dimensions-spec)|No|
 |`transformSpec`|Custom `transformSpec`. See [Automatic compaction transformSpec](#automatic-compaction-transform-spec)|No|
+|`metricsSpec`|Custom [`metricsSpec`](../ingestion/ingestion-spec.md#metricsspec). The compaction task preserves any existing metrics regardless of whether `metricsSpec` is specified. If `metricsSpec` is specified, Druid does not reapply any aggregators matching the metric names specified in `metricsSpec` to rows that already have the associated metrics. For rows that do not already have the metric specified in `metricsSpec`, Druid applies the metric aggregator on the source column, then proceeds to combine the metrics across segments as usual. If `metricsSpec` is not specified, Druid automatically discovers the metrics in the existing and compacted segments and combines existing metrics with the same metric name across segments.|No|

Review Comment:
   ```suggestion
   |`metricsSpec`|Custom [`metricsSpec`](../ingestion/ingestion-spec.md#metricsspec). The compaction task preserves any existing metrics regardless of whether `metricsSpec` is specified. If `metricsSpec` is specified, Druid does not reapply any aggregators matching the metric names specified in `metricsSpec` to rows that already have the associated metrics. For rows that do not have the metric specified in `metricsSpec`, Druid applies the metric aggregator on the source column, then proceeds to combine the metrics across segments as usual. If `metricsSpec` is not specified, Druid automatically discovers the metrics in the existing and compacted segments and combines existing metrics with the same metric name across segments. Metrics of the same name are assumed to be from the same type of aggregator, otherwise the compaction task may fail.|No|
   ```
   Does Druid fail if it tries to combine similar aggregators, like `longSum` and `doubleSum`? If those can be combined successfully, could rephrase to something like "Metrics of the same name are assumed to be compatible for combining across segments, otherwise the compaction task may fail."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vtlim commented on a diff in pull request #12415: Add docs to metric spec for auto compaction

Posted by GitBox <gi...@apache.org>.
vtlim commented on code in PR #12415:
URL: https://github.com/apache/druid/pull/12415#discussion_r848750303


##########
docs/configuration/index.md:
##########
@@ -970,6 +970,7 @@ A description of the compaction config is:
 |`granularitySpec`|Custom `granularitySpec`. See [Automatic compaction granularitySpec](#automatic-compaction-granularityspec)|No|
 |`dimensionsSpec`|Custom `dimensionsSpec`. See [Automatic compaction dimensionsSpec](#automatic-compaction-dimensions-spec)|No|
 |`transformSpec`|Custom `transformSpec`. See [Automatic compaction transformSpec](#automatic-compaction-transform-spec)|No|
+|`metricsSpec`|Custom [`metricsSpec`](../ingestion/ingestion-spec.md#metricsspec). The compaction task preserves any existing metrics regardless of whether `metricsSpec` is specified. If `metricsSpec` is specified, Druid does not reapply any aggregators matching the metric names specified in `metricsSpec` to rows that already have the associated metrics. For rows that do not already have the metric specified in `metricsSpec`, Druid applies the metric aggregator on the source column, then proceeds to combine the metrics across segments as usual. If `metricsSpec` is not specified, Druid automatically discovers the metrics in the existing segments and combines existing metrics with the same metric name across segments. Aggregator of metrics of the same name are assumed to be compatible for combining across segments, otherwise the compaction task may fail.|No|

Review Comment:
   ```suggestion
   |`metricsSpec`|Custom [`metricsSpec`](../ingestion/ingestion-spec.md#metricsspec). The compaction task preserves any existing metrics regardless of whether `metricsSpec` is specified. If `metricsSpec` is specified, Druid does not reapply any aggregators matching the metric names specified in `metricsSpec` to rows that already have the associated metrics. For rows that do not already have the metric specified in `metricsSpec`, Druid applies the metric aggregator on the source column, then proceeds to combine the metrics across segments as usual. If `metricsSpec` is not specified, Druid automatically discovers the metrics in the existing segments and combines existing metrics with the same metric name across segments. Aggregators for metrics with the same name are assumed to be compatible for combining across segments, otherwise the compaction task may fail.|No|
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org