You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "bsyk (via GitHub)" <gi...@apache.org> on 2023/11/08 03:42:39 UTC

[PR] Add SpectatorHistogram extension (druid)

bsyk opened a new pull request, #15340:
URL: https://github.com/apache/druid/pull/15340

   ### Description
   
   Adds the contribution extension providing support for SpectatorHistogram. A fast, small alternative to data-sketches or T-Digest for computing approximate percentiles.
   
   See documentation included in the PR for more details.
   
   <hr>
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist below are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [x] added documentation for new or modified features or behaviors.
   - [x] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [x] added integration tests.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1446427421


##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider SpectatorHistogram to compute percentile approximations. This extension has a reduced storage footprint compared to the [DataSketches extension](../extensions-core/datasketches-extension.md), which results in smaller segment sizes, faster loading from deep storage, and lower memory usage. This extension provides fast and accurate queries on large datasets at low storage cost.
+
+This aggregator only applies when your raw data contains positive long integer values. Do not use this aggregator if you have negative values in your data.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* `wikipedia` contains the dataset ingested as is, without rollup
+* `wikipedia_spectator` contains the dataset with a single extra metric column of type `spectatorHistogram` for the `added` column
+* `wikipedia_datasketch` contains the dataset with a single extra metric column of type `quantilesDoublesSketch` for the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the `quantilesDoublesSketch`
+adds 48 bytes per row. This represents an eightfold reduction in additional storage size for spectator histograms.
+
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size savings. For example, when you ingest the Wikipedia dataset
+with day-grain query granularity and remove all dimensions except `countryName`,
+this results in a segment that has just 106 rows. The base segment has 87 bytes per row.
+Compare the following bytes per row for SpectatorHistogram versus DataSketches:
+* An additional `spectatorHistogram` column adds 27 bytes per row on average.
+* An additional `quantilesDoublesSketch` column adds 255 bytes per row.
+
+SpectatorHistogram reduces the additional storage size by 9.4 times in this example.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in DataSketches `quantilesDoublesSketch` aggregator, but is
+opinionated and optimized for typical measurements from cloud services and web apps.
+For example, measurements such as page load time, transferred bytes, response time, and request latency.

Review Comment:
   ```suggestion
   It provides similar functionality to the built-in DataSketches `quantilesDoublesSketch` aggregator, but is
   opinionated to maintain higher absolute accuracy at smaller values.
   Larger values have lower absolute accuracy; however, relative accuracy is maintained across the range.
   See [Bucket boundaries](#histogram-bucket-boundaries) for more information.
   The SpectatorHistogram is optimized for typical measurements from cloud services and web apps,
   such as measurements such as page load time, transferred bytes, response time, and request latency.
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "bsyk (via GitHub)" <gi...@apache.org>.
bsyk commented on PR #15340:
URL: https://github.com/apache/druid/pull/15340#issuecomment-1883628874

   > Left one more comment to try to explain the opinionated behavior. Otherwise LGTM for docs.
   
   Fantastic. Thanks for all your suggestions, they've made the docs a lot more clear.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1445441959


##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.
+
+```json
+{ "4":  8, "5": 15, "6": 37, "7": 9, "8": 3, "10": 1, "13": 1 }
+```
+
+## Loading the extension
+To use SpectatorHistogram, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
+
+```
+druid.extensions.loadList=["druid-spectator-histogram"]
+```
+
+## Aggregators
+
+The result of the aggregation is a histogram that is built by ingesting numeric values from
+the raw data, or from combining pre-aggregated histograms. The result is represented in 
+JSON format where the keys are the bucket index and the values are the count of entries
+in that bucket.
+
+The buckets are defined as per the Spectator [PercentileBuckets](https://github.com/Netflix/spectator/blob/main/spectator-api/src/main/java/com/netflix/spectator/api/histogram/PercentileBuckets.java) specification.
+See [Appendix](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+```js
+  // The set of buckets is generated by using powers of 4 and incrementing by one-third of the
+  // previous power of 4 in between as long as the value is less than the next power of 4 minus
+  // the delta.
+  //
+  // Base: 1, 2, 3
+  //
+  // 4 (4^1), delta = 1 (~1/3 of 4)
+  //     5, 6, 7, ..., 14,
+  //
+  // 16 (4^2), delta = 5 (~1/3 of 16)
+  //    21, 26, 31, ..., 56,
+  //
+  // 64 (4^3), delta = 21 (~1/3 of 64)
+  // ...
+```
+
+There are multiple aggregator types included, all of which are based on the same
+underlying implementation. The different types signal to the Atlas-Druid service (if using)
+how to handle the resulting data from a query.
+
+* spectatorHistogramTimer signals that the histogram is representing
+a collection of timer values. It is recommended to normalize timer values to nanoseconds
+at, or prior to, ingestion. If queried via the Atlas-Druid service, it will
+normalize timers to second resolution at query time as a more natural unit of time
+for human consumption.
+* spectatorHistogram and spectatorHistogramDistribution are generic histograms that
+can be used to represent any measured value without units. No normalization is
+required or performed.
+
+### `spectatorHistogram` aggregator
+Alias: `spectatorHistogramDistribution`, `spectatorHistogramTimer`
+
+To aggregate at query time:
+```
+{
+  "type" : "spectatorHistogram",
+  "name" : <output_name>,
+  "fieldName" : <column_name>
+ }
+```
+
+| Property  | Description                                                                                                  | Required? |
+|-----------|--------------------------------------------------------------------------------------------------------------|-----------|
+| type      | This String must be one of "spectatorHistogram", "spectatorHistogramTimer", "spectatorHistogramDistribution" | yes       |
+| name      | A String for the output (result) name of the aggregation.                                                    | yes       |
+| fieldName | A String for the name of the input field containing raw numeric values or pre-aggregated histograms.         | yes       |
+
+### `longSum`, `doubleSum` and `floatSum` aggregators
+To get the population size (count of events contributing to the histogram):
+```
+{
+  "type" : "longSum",
+  "name" : <output_name>,
+  "fieldName" : <column_name_of_aggregated_histogram>
+ }
+```
+
+| Property  | Description                                                                    | Required? |
+|-----------|--------------------------------------------------------------------------------|-----------|
+| type      | Must be "longSum", "doubleSum", or "floatSum".                                 | yes       |
+| name      | A String for the output (result) name of the aggregation.                      | yes       |
+| fieldName | A String for the name of the input field containing pre-aggregated histograms. | yes       |
+
+## Post Aggregators
+
+### Percentile (singular)
+This returns a single percentile calculation based on the distribution of the values in the aggregated histogram.
+
+```
+{
+  "type": "percentileSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentile": <decimal percentile, e.g. 50.0 for median>
+}
+```
+
+| Property   | Description                                                 | Required? |
+|------------|-------------------------------------------------------------|-----------|
+| type       | This String should always be "percentileSpectatorHistogram" | yes       |
+| name       | A String for the output (result) name of the calculation.   | yes       |
+| field      | A field reference pointing to the aggregated histogram.     | yes       |
+| percentile | A single decimal percentile between 0.0 and 100.0           | yes       |
+
+### Percentiles (multiple)
+This returns an array of percentiles corresponding to those requested.
+
+```
+{
+  "type": "percentilesSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentiles": [25, 50, 75, 99.5]
+}
+```
+
+> Note: It's more efficient to request multiple percentiles in a single query
+than to request individual percentiles in separate queries. This array-based
+helper is provided for convenience and has a marginal performance benefit over
+using the singular percentile post-aggregator multiple times within a query.
+The more expensive part of the query is the aggregation of the histogram.
+The post-aggregation calculations all happen on the same aggregated histogram.
+
+Results will contain arrays matching the length and order of the requested
+array of percentiles.
+
+```
+"percentilesAdded": [
+    0.5504911679884643, // 25th percentile
+    4.013975155279504,  // 50th percentile 
+    78.89518317503394,  // 75th percentile
+    8580.024999999994   // 99.5th percentile
+]
+```
+
+| Property    | Description                                                  | Required? |
+|-------------|--------------------------------------------------------------|-----------|
+| type        | This String should always be "percentilesSpectatorHistogram" | yes       |
+| name        | A String for the output (result) name of the calculation.    | yes       |
+| field       | A field reference pointing to the aggregated histogram.      | yes       |
+| percentiles | Non-empty array of decimal percentiles between 0.0 and 100.0 | yes       |
+
+## Appendix
+
+### Example Ingestion Spec
+Example of ingesting the sample wikipedia dataset with a histogram metric column:
+```json
+{
+  "type": "index_parallel",
+  "spec": {
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "http",
+        "uris": ["https://druid.apache.org/data/wikipedia.json.gz"]
+      },
+      "inputFormat": { "type": "json" }
+    },
+    "dataSchema": {
+      "granularitySpec": {
+        "segmentGranularity": "day",
+        "queryGranularity": "minute",
+        "rollup": true
+      },
+      "dataSource": "wikipedia",
+      "timestampSpec": { "column": "timestamp", "format": "iso" },
+      "dimensionsSpec": {
+        "dimensions": [
+          "isRobot",
+          "channel",
+          "flags",
+          "isUnpatrolled",
+          "page",
+          "diffUrl",
+          "comment",
+          "isNew",
+          "isMinor",
+          "isAnonymous",
+          "user",
+          "namespace",
+          "cityName",
+          "countryName",
+          "regionIsoCode",
+          "metroCode",
+          "countryIsoCode",
+          "regionName"
+        ]
+      },
+      "metricsSpec": [
+        { "name": "count", "type": "count" },
+        { "name": "sum_added", "type": "longSum", "fieldName": "added" },
+        {
+          "name": "hist_added",
+          "type": "spectatorHistogram",
+          "fieldName": "added"
+        }
+      ]
+    },
+    "tuningConfig": {
+      "type": "index_parallel",
+      "partitionsSpec": { "type": "hashed" },
+      "forceGuaranteedRollup": true
+    }
+  }
+}
+```
+
+### Example Query
+Example query using the sample wikipedia dataset:
+```json
+{
+  "queryType": "timeseries",
+  "dataSource": {
+    "type": "table",
+    "name": "wikipedia"
+  },
+  "intervals": {
+    "type": "intervals",
+    "intervals": [
+      "0000-01-01/9999-12-31"
+    ]
+  },
+  "granularity": {
+    "type": "all"
+  },
+  "aggregations": [
+    {
+      "type": "spectatorHistogram",
+      "name": "histogram_added",
+      "fieldName": "added"
+    }
+  ],
+  "postAggregations": [
+    {
+      "type": "percentileSpectatorHistogram",
+      "name": "medianAdded",
+      "field": {
+        "type": "fieldAccess",
+        "fieldName": "histogram_added"
+      },
+      "percentile": "50.0"
+    }
+  ]
+}
+```
+Results in
+```json
+[
+  {
+    "result": {
+      "histogram_added": {
+        "0": 11096, "1": 632, "2": 297, "3": 187, "4": 322, "5": 161,
+        "6": 174, "7": 127, "8": 125, "9": 162, "10": 123, "11": 106,
+        "12": 95, "13": 104, "14": 95, "15": 588, "16": 540, "17": 690,
+        "18": 719, "19": 478, "20": 288, "21": 250, "22": 219, "23": 224,
+        "24": 737, "25": 424, "26": 343, "27": 266, "28": 232, "29": 217,
+        "30": 171, "31": 164, "32": 161, "33": 530, "34": 339, "35": 236,
+        "36": 181, "37": 152, "38": 113, "39": 128, "40": 80, "41": 75,
+        "42": 289, "43": 145, "44": 138, "45": 83, "46": 45, "47": 46,
+        "48": 64, "49": 65, "50": 71, "51": 421, "52": 525, "53": 59,
+        "54": 31, "55": 35, "56": 8, "57": 10, "58": 5, "59": 4, "60": 11,
+        "61": 10, "62": 5, "63": 2, "64": 2, "65": 1, "67": 1, "68": 1,
+        "69": 1, "70": 1, "71": 1, "78": 2
+      },
+      "medianAdded": 4.013975155279504
+    },
+    "timestamp": "2016-06-27T00:00:00.000Z"
+  }
+]
+```
+
+### Histogram Bucket Boundaries
+These are the upper bounds of each bucket index. There are 276 buckets.
+The first bucket index is 0 and the last bucket index is 275.
+As you can see the bucket widths increase as the bucket index increases. This leads to a greater absolute error for larger values, but maintains a relative error of rough percentage across the number range.
+i.e the maximum error at value 10 is 0 since the bucket width is 1. But for a value of 16,000,000,000 the bucket width is 1,431,655,768 giving an error of up to ~8.9%. In practice, the observed error of computed percentiles is in the range (0.1%, 3%).

Review Comment:
   ```suggestion
   For example, the maximum error at value 10 is zero since the bucket width is 1 (the difference of 11-10). For a value of 16,000,000,000, the bucket width is 1,431,655,768 (from 17179869184-15748213416). This gives an error of up to ~8.9%. In practice, the observed error of computed percentiles is in the range of (0.1%, 3%).
   ```
   Not sure if it's 11-10 or 10-9. Also it's not clear how you get 8.9%. Consider explaining in more detail, or linking to relevant docs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "maytasm (via GitHub)" <gi...@apache.org>.
maytasm merged PR #15340:
URL: https://github.com/apache/druid/pull/15340


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "maytasm (via GitHub)" <gi...@apache.org>.
maytasm commented on PR #15340:
URL: https://github.com/apache/druid/pull/15340#issuecomment-1889750276

   @suneet-s @vtlim @adarshsanjeev 
   Are we all good on merging this change in? I'll merge this change in at the end of the day if no one has any objection.
   This change is contained in it's own extension (and is a contrib extension) so should be safe to merge.
   Including my approval, we have 3 +1s from committers and total reviews from 4 committers. 
   I think this extension does serve a specific use case and does it well. This could be beneficial to some people and hence I do see a value in getting this merge in so that Druid users can easily use/try it out. 
   By getting this extension merge in, we can also keep this extension up-to-dated with any new Druid changes, encourage improvements to this extension from the community (such as adding vectorization, etc) and get feedback about the extension. Note that at Netflix, we have been running this extension at scale in production for a few years so it is also battle tested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "adarshsanjeev (via GitHub)" <gi...@apache.org>.
adarshsanjeev commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1443047161


##########
extensions-contrib/spectator-histogram/src/main/java/org/apache/druid/spectator/histogram/SpectatorHistogramAggregatorFactory.java:
##########
@@ -0,0 +1,373 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.druid.query.aggregation.AggregateCombiner;
+import org.apache.druid.query.aggregation.Aggregator;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.AggregatorFactoryNotMergeableException;
+import org.apache.druid.query.aggregation.AggregatorUtil;
+import org.apache.druid.query.aggregation.BufferAggregator;
+import org.apache.druid.query.aggregation.ObjectAggregateCombiner;
+import org.apache.druid.query.cache.CacheKeyBuilder;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.column.ValueType;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+
+@JsonTypeName(SpectatorHistogramAggregatorFactory.TYPE_NAME)
+public class SpectatorHistogramAggregatorFactory extends AggregatorFactory
+{
+
+  @Nonnull
+  private final String name;
+  @Nonnull
+  private final String fieldName;
+
+  @Nonnull
+  private final byte cacheTypeId;
+
+  public static final String TYPE_NAME = "spectatorHistogram";
+
+  @JsonCreator
+  public SpectatorHistogramAggregatorFactory(
+      @JsonProperty("name") final String name,
+      @JsonProperty("fieldName") final String fieldName
+  )
+  {
+    this(name, fieldName, AggregatorUtil.SPECTATOR_HISTOGRAM_CACHE_TYPE_ID);
+  }
+
+  public SpectatorHistogramAggregatorFactory(
+      final String name,
+      final String fieldName,
+      final byte cacheTypeId
+  )
+  {
+    this.name = Objects.requireNonNull(name, "Must have a valid, non-null aggregator name");
+    this.fieldName = Objects.requireNonNull(fieldName, "Parameter fieldName must be specified");
+    this.cacheTypeId = cacheTypeId;
+  }
+
+
+  @Override
+  public byte[] getCacheKey()
+  {
+    return new CacheKeyBuilder(
+        cacheTypeId
+    ).appendString(fieldName).build();
+  }
+
+
+  @Override
+  public Aggregator factorize(ColumnSelectorFactory metricFactory)
+  {
+    return new SpectatorHistogramAggregator(metricFactory.makeColumnValueSelector(fieldName));
+  }
+
+  @Override
+  public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
+  {
+    return new SpectatorHistogramBufferAggregator(metricFactory.makeColumnValueSelector(fieldName));
+  }
+
+  // This is used when writing metrics to segment files to check whether the column is sorted.
+  // Since there is no sensible way really to compare histograms, compareTo always returns 1.
+  public static final Comparator<SpectatorHistogram> COMPARATOR = (o, o1) -> {
+    if (o == null && o1 == null) {
+      return 0;
+    } else if (o != null && o1 == null) {
+      return -1;
+    } else if (o == null) {
+      return 1;
+    }
+    return Integer.compare(o.hashCode(), o1.hashCode());
+  };
+
+  @Override
+  public Comparator getComparator()
+  {
+    return COMPARATOR;
+  }
+
+  @Override
+  public Object combine(@Nullable Object lhs, @Nullable Object rhs)
+  {
+    if (lhs == null) {
+      return rhs;
+    }
+    if (rhs == null) {
+      return lhs;
+    }
+    SpectatorHistogram lhsHisto = (SpectatorHistogram) lhs;
+    SpectatorHistogram rhsHisto = (SpectatorHistogram) rhs;
+    lhsHisto.merge(rhsHisto);
+    return lhsHisto;
+  }
+
+  @Override
+  public AggregatorFactory getCombiningFactory()
+  {
+    return new SpectatorHistogramAggregatorFactory(name, name);
+  }
+
+  @Override
+  public AggregatorFactory getMergingFactory(AggregatorFactory other) throws AggregatorFactoryNotMergeableException
+  {
+    if (other.getName().equals(this.getName()) && this.getClass() == other.getClass()) {
+      return getCombiningFactory();
+    } else {
+      throw new AggregatorFactoryNotMergeableException(this, other);
+    }
+  }
+
+  @Override
+  public List<AggregatorFactory> getRequiredColumns()
+  {
+    return Collections.singletonList(
+        new SpectatorHistogramAggregatorFactory(
+            fieldName,
+            fieldName
+        )
+    );
+  }
+
+  @Override
+  public Object deserialize(Object serializedHistogram)
+  {
+    return SpectatorHistogram.deserialize(serializedHistogram);
+  }
+
+  @Nullable
+  @Override
+  public Object finalizeComputation(@Nullable Object object)
+  {
+    return object;
+  }
+
+  @Override
+  @JsonProperty
+  public String getName()
+  {
+    return name;
+  }
+
+  @JsonProperty
+  public String getFieldName()
+  {
+    return fieldName;
+  }
+
+  @Override
+  public List<String> requiredFields()
+  {
+    return Collections.singletonList(fieldName);
+  }
+
+  @Override
+  public String getComplexTypeName()
+  {
+    return TYPE_NAME;
+  }
+
+  @Override
+  public ValueType getType()
+  {
+    return ValueType.COMPLEX;
+  }
+
+  @Override
+  public ValueType getFinalizedType()
+  {
+    return ValueType.COMPLEX;
+  }
+
+  @Override
+  public int getMaxIntermediateSize()
+  {
+    return SpectatorHistogram.getMaxIntermdiateHistogramSize();
+  }
+
+  @Override
+  public AggregateCombiner makeAggregateCombiner()
+  {
+    return new ObjectAggregateCombiner<SpectatorHistogram>()
+    {
+      private SpectatorHistogram combined = null;
+
+      @Override
+      public void reset(final ColumnValueSelector selector)
+      {
+        combined = null;
+        fold(selector);
+      }
+
+      @Override
+      public void fold(final ColumnValueSelector selector)
+      {
+        SpectatorHistogram other = (SpectatorHistogram) selector.getObject();
+        if (other == null) {
+          return;
+        }
+        if (combined == null) {
+          combined = new SpectatorHistogram();
+        }
+        combined.merge(other);
+      }
+
+      @Nullable
+      @Override
+      public SpectatorHistogram getObject()
+      {
+        return combined;
+      }
+
+      @Override
+      public Class<SpectatorHistogram> classOfObject()
+      {
+        return SpectatorHistogram.class;
+      }
+    };
+  }
+
+  @Override
+  public boolean equals(final Object o)
+  {
+    if (this == o) {
+      return true;
+    }
+    if (o == null || !getClass().equals(o.getClass())) {
+      return false;
+    }
+    final SpectatorHistogramAggregatorFactory that = (SpectatorHistogramAggregatorFactory) o;
+
+    //TODO: samarth should we check for equality of contents in count arrays?

Review Comment:
   Also to be resolved



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "github-advanced-security[bot] (via GitHub)" <gi...@apache.org>.
github-advanced-security[bot] commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1443541142


##########
extensions-contrib/spectator-histogram/src/main/java/org/apache/druid/spectator/histogram/SpectatorHistogramComplexMetricSerde.java:
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import org.apache.druid.data.input.InputRow;
+import org.apache.druid.segment.GenericColumnSerializer;
+import org.apache.druid.segment.column.ColumnBuilder;
+import org.apache.druid.segment.data.ObjectStrategy;
+import org.apache.druid.segment.serde.ComplexMetricExtractor;
+import org.apache.druid.segment.serde.ComplexMetricSerde;
+import org.apache.druid.segment.writeout.SegmentWriteOutMedium;
+
+import java.nio.ByteBuffer;
+
+public class SpectatorHistogramComplexMetricSerde extends ComplexMetricSerde
+{
+  private static final SpectatorHistogramObjectStrategy STRATEGY = new SpectatorHistogramObjectStrategy();
+  private final String typeName;
+
+  SpectatorHistogramComplexMetricSerde(String type)
+  {
+    this.typeName = type;
+  }
+
+  @Override
+  public String getTypeName()
+  {
+    return typeName;
+  }
+
+  @Override
+  public ComplexMetricExtractor getExtractor()
+  {
+    return new ComplexMetricExtractor()
+    {
+      @Override
+      public Class<SpectatorHistogram> extractedClass()
+      {
+        return SpectatorHistogram.class;
+      }
+
+      @Override
+      public Object extractValue(final InputRow inputRow, final String metricName)
+      {
+        final Object object = inputRow.getRaw(metricName);
+        if (object == null || object instanceof SpectatorHistogram || object instanceof Number) {
+          return object;
+        }
+        if (object instanceof String) {
+          String objectString = (String) object;
+          // Ignore empty values
+          if (objectString.trim().isEmpty()) {
+            return null;
+          }
+          // Treat as long number, if it looks like a number
+          if (Character.isDigit((objectString).charAt(0))) {
+            return Long.parseLong((String) object);

Review Comment:
   ## Missing catch of NumberFormatException
   
   Potential uncaught 'java.lang.NumberFormatException'.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/5964)



##########
extensions-contrib/spectator-histogram/src/test/java/org/apache/druid/spectator/histogram/SpectatorHistogramTest.java:
##########
@@ -0,0 +1,451 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.netflix.spectator.api.histogram.PercentileBuckets;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.writeout.OnHeapMemorySegmentWriteOutMedium;
+import org.apache.druid.segment.writeout.SegmentWriteOutMedium;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+
+public class SpectatorHistogramTest
+{
+  @Test
+  public void testToBytesSmallValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.insert(10);
+    histogram.insert(30);
+    histogram.insert(40);
+    histogram.insert(40);
+    histogram.insert(40);
+    histogram.insert(50);
+    histogram.insert(50);
+    // Check the full range of bucket IDs still work
+    long bigValue = PercentileBuckets.get(270);
+    histogram.insert(bigValue);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 8, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = 0;
+    Assert.assertEquals("Should compact small values within key bytes", 5 * (keySize + valSize), bytes.length);

Review Comment:
   ## Result of multiplication cast to wider type
   
   Potential overflow in [int multiplication](1) before it is converted to long by use in an invocation context.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/5967)



##########
extensions-contrib/spectator-histogram/src/test/java/org/apache/druid/spectator/histogram/SpectatorHistogramTest.java:
##########
@@ -0,0 +1,451 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.netflix.spectator.api.histogram.PercentileBuckets;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.writeout.OnHeapMemorySegmentWriteOutMedium;
+import org.apache.druid.segment.writeout.SegmentWriteOutMedium;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+
+public class SpectatorHistogramTest
+{
+  @Test
+  public void testToBytesSmallValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.insert(10);
+    histogram.insert(30);
+    histogram.insert(40);
+    histogram.insert(40);
+    histogram.insert(40);
+    histogram.insert(50);
+    histogram.insert(50);
+    // Check the full range of bucket IDs still work
+    long bigValue = PercentileBuckets.get(270);
+    histogram.insert(bigValue);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 8, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = 0;
+    Assert.assertEquals("Should compact small values within key bytes", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(3L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(2L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(bigValue)));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 8, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesSmallishValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 64L);
+    histogram.add(PercentileBuckets.indexOf(30), 127L);
+    histogram.add(PercentileBuckets.indexOf(40), 111L);
+    histogram.add(PercentileBuckets.indexOf(50), 99L);
+    histogram.add(270, 100L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 501, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Byte.BYTES;
+    Assert.assertEquals("Should compact small values to a byte", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(64L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(127L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(111L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(99L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(100L, deserialized.get(270));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 501, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesMedValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 512L);
+    histogram.add(PercentileBuckets.indexOf(30), 1024L);
+    histogram.add(PercentileBuckets.indexOf(40), 2048L);
+    histogram.add(PercentileBuckets.indexOf(50), 4096L);
+    histogram.add(270, 8192L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 15872, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Short.BYTES;
+    Assert.assertEquals("Should compact medium values to short", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(512L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(1024L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(2048L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(4096L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(8192L, deserialized.get(270));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 15872, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesLargerValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 100000L);
+    histogram.add(PercentileBuckets.indexOf(30), 200000L);
+    histogram.add(PercentileBuckets.indexOf(40), 500000L);
+    histogram.add(PercentileBuckets.indexOf(50), 10000000L);
+    histogram.add(270, 50000000L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 60800000, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Integer.BYTES;
+    Assert.assertEquals("Should compact larger values to integer", 5 * (keySize + valSize), bytes.length);

Review Comment:
   ## Result of multiplication cast to wider type
   
   Potential overflow in [int multiplication](1) before it is converted to long by use in an invocation context.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/5970)



##########
extensions-contrib/spectator-histogram/src/test/java/org/apache/druid/spectator/histogram/SpectatorHistogramAggregatorTest.java:
##########
@@ -0,0 +1,733 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.netflix.spectator.api.histogram.PercentileBuckets;
+import org.apache.druid.jackson.DefaultObjectMapper;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.query.Druids;
+import org.apache.druid.query.QueryPlus;
+import org.apache.druid.query.QueryRunner;
+import org.apache.druid.query.QueryRunnerTestHelper;
+import org.apache.druid.query.Result;
+import org.apache.druid.query.aggregation.AggregationTestHelper;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.AggregatorUtil;
+import org.apache.druid.query.groupby.GroupByQueryConfig;
+import org.apache.druid.query.groupby.GroupByQueryRunnerTest;
+import org.apache.druid.query.groupby.ResultRow;
+import org.apache.druid.query.metadata.SegmentMetadataQueryConfig;
+import org.apache.druid.query.metadata.SegmentMetadataQueryQueryToolChest;
+import org.apache.druid.query.metadata.SegmentMetadataQueryRunnerFactory;
+import org.apache.druid.query.metadata.metadata.ColumnAnalysis;
+import org.apache.druid.query.metadata.metadata.SegmentAnalysis;
+import org.apache.druid.query.metadata.metadata.SegmentMetadataQuery;
+import org.apache.druid.query.timeseries.TimeseriesResultValue;
+import org.apache.druid.segment.IndexIO;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.QueryableIndexSegment;
+import org.apache.druid.segment.TestHelper;
+import org.apache.druid.segment.column.ColumnConfig;
+import org.apache.druid.testing.InitializedNullHandlingTest;
+import org.apache.druid.timeline.SegmentId;
+import org.junit.Assert;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+@RunWith(Parameterized.class)
+public class SpectatorHistogramAggregatorTest extends InitializedNullHandlingTest
+{
+  public static final String INPUT_DATA_PARSE_SPEC = String.join(
+      "\n",
+      "{",
+      "  \"type\": \"string\",",
+      "  \"parseSpec\": {",
+      "    \"format\": \"tsv\",",
+      "    \"timestampSpec\": {\"column\": \"timestamp\", \"format\": \"yyyyMMddHH\"},",
+      "    \"dimensionsSpec\": {",
+      "      \"dimensions\": [\"product\"],",
+      "      \"dimensionExclusions\": [],",
+      "      \"spatialDimensions\": []",
+      "    },",
+      "    \"columns\": [\"timestamp\", \"product\", \"cost\"]",
+      "  }",
+      "}"
+  );
+  @Rule
+  public final TemporaryFolder tempFolder = new TemporaryFolder();
+
+  private static final SegmentMetadataQueryRunnerFactory METADATA_QR_FACTORY = new SegmentMetadataQueryRunnerFactory(
+      new SegmentMetadataQueryQueryToolChest(new SegmentMetadataQueryConfig()),
+      QueryRunnerTestHelper.NOOP_QUERYWATCHER
+  );
+  private static final Map<String, SpectatorHistogram> EXPECTED_HISTOGRAMS = new HashMap<>();
+
+  static {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 1L);
+    EXPECTED_HISTOGRAMS.put("A", histogram);
+
+    histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(30 + 40 + 40 + 40 + 50 + 50), 1L);
+    EXPECTED_HISTOGRAMS.put("B", histogram);
+
+    histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(50 + 20000), 1L);
+    EXPECTED_HISTOGRAMS.put("C", histogram);
+  }
+
+  private final AggregationTestHelper helper;
+  private final AggregationTestHelper timeSeriesHelper;
+
+  public SpectatorHistogramAggregatorTest(final GroupByQueryConfig config)
+  {
+    SpectatorHistogramModule.registerSerde();
+    SpectatorHistogramModule module = new SpectatorHistogramModule();
+    helper = AggregationTestHelper.createGroupByQueryAggregationTestHelper(
+        module.getJacksonModules(), config, tempFolder);
+    timeSeriesHelper = AggregationTestHelper.createTimeseriesQueryAggregationTestHelper(
+        module.getJacksonModules(),
+        tempFolder
+    );
+  }
+
+  @Parameterized.Parameters(name = "{0}")
+  public static Collection<?> constructorFeeder()
+  {
+    final List<Object[]> constructors = new ArrayList<>();
+    for (GroupByQueryConfig config : GroupByQueryRunnerTest.testConfigs()) {
+      constructors.add(new Object[]{config});
+    }
+    return constructors;
+  }
+
+  // this is to test Json properties and equals
+  @Test
+  public void serializeDeserializeFactoryWithFieldName() throws Exception
+  {
+    ObjectMapper objectMapper = new DefaultObjectMapper();
+    new SpectatorHistogramModule().getJacksonModules().forEach(objectMapper::registerModule);
+    SpectatorHistogramAggregatorFactory factory = new SpectatorHistogramAggregatorFactory(
+        "name",
+        "filedName",
+        AggregatorUtil.SPECTATOR_HISTOGRAM_CACHE_TYPE_ID
+    );
+    AggregatorFactory other = objectMapper.readValue(
+        objectMapper.writeValueAsString(factory),
+        AggregatorFactory.class
+    );
+
+    Assert.assertEquals(factory, other);
+  }
+
+  @Test
+  public void testBuildingHistogramQueryTime() throws Exception
+  {
+    Sequence<ResultRow> seq = helper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"longSum\", \"name\": \"cost_sum\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"groupBy\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"dimensions\": [\"product\"],",
+            "  \"aggregations\": [",
+            "    {\"type\": \"spectatorHistogram\", \"name\": \"cost_histogram\", \"fieldName\": "
+            + "\"cost_sum\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+    List<ResultRow> results = seq.toList();
+    assertResultsMatch(results, 0, "A");
+    assertResultsMatch(results, 1, "B");
+    assertResultsMatch(results, 2, "C");
+  }
+
+  @Test
+  public void testBuildingAndMergingHistograms() throws Exception
+  {
+    Sequence<ResultRow> seq = helper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogram\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"groupBy\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"dimenions\": [],",
+            "  \"aggregations\": [",
+            "    {\"type\": \"spectatorHistogram\", \"name\": \"merged_cost_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+    SpectatorHistogram expected = new SpectatorHistogram();
+    expected.add(PercentileBuckets.indexOf(10), 1L);
+    expected.add(PercentileBuckets.indexOf(30), 1L);
+    expected.add(PercentileBuckets.indexOf(40), 3L);
+    expected.add(PercentileBuckets.indexOf(50), 3L);
+    expected.add(PercentileBuckets.indexOf(20000), 1L);
+
+    List<ResultRow> results = seq.toList();
+    Assert.assertEquals(1, results.size());
+    Assert.assertEquals(expected, results.get(0).get(0));
+  }
+
+  @Test
+  public void testBuildingAndMergingHistogramsTimeseriesQuery() throws Exception
+  {
+    Object rawseq = timeSeriesHelper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogram\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"timeseries\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"aggregations\": [",
+            "    {\"type\": \"spectatorHistogram\", \"name\": \"merged_cost_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+    SpectatorHistogram expected = new SpectatorHistogram();
+    expected.add(PercentileBuckets.indexOf(10), 1L);
+    expected.add(PercentileBuckets.indexOf(30), 1L);
+    expected.add(PercentileBuckets.indexOf(40), 3L);
+    expected.add(PercentileBuckets.indexOf(50), 3L);
+    expected.add(PercentileBuckets.indexOf(20000), 1L);
+
+    Sequence<Result<TimeseriesResultValue>> seq = (Sequence<Result<TimeseriesResultValue>>) rawseq;
+    List<Result<TimeseriesResultValue>> results = seq.toList();
+    Assert.assertEquals(1, results.size());
+    SpectatorHistogram value = (SpectatorHistogram) results.get(0).getValue().getMetric("merged_cost_histogram");
+    Assert.assertEquals(expected, value);
+  }
+
+  @Test
+  public void testBuildingAndMergingGroupbyHistograms() throws Exception
+  {
+    Sequence<ResultRow> seq = helper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogram\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"groupBy\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"dimensions\": [\"product\"],",
+            "  \"aggregations\": [",
+            "    {\"type\": \"spectatorHistogram\", \"name\": \"merged_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+
+    List<ResultRow> results = seq.toList();
+    Assert.assertEquals(6, results.size());
+
+    SpectatorHistogram expectedA = new SpectatorHistogram();
+    expectedA.add(PercentileBuckets.indexOf(10), 1L);
+    Assert.assertEquals(expectedA, results.get(0).get(1));
+
+    SpectatorHistogram expectedB = new SpectatorHistogram();
+    expectedB.add(PercentileBuckets.indexOf(30), 1L);
+    expectedB.add(PercentileBuckets.indexOf(40), 3L);
+    expectedB.add(PercentileBuckets.indexOf(50), 2L);
+    Assert.assertEquals(expectedB, results.get(1).get(1));
+
+    SpectatorHistogram expectedC = new SpectatorHistogram();
+    expectedC.add(PercentileBuckets.indexOf(50), 1L);
+    expectedC.add(PercentileBuckets.indexOf(20000), 1L);
+    Assert.assertEquals(expectedC, results.get(2).get(1));
+
+    Assert.assertNull(results.get(3).get(1));
+    Assert.assertNull(results.get(4).get(1));
+    Assert.assertNull(results.get(5).get(1));
+  }
+
+  @Test
+  public void testBuildingAndCountingHistograms() throws Exception
+  {
+    Sequence<ResultRow> seq = helper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogram\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"groupBy\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"dimenions\": [],",
+            "  \"aggregations\": [",
+            "    {\"type\": \"longSum\", \"name\": \"count_histogram\", \"fieldName\": "
+            + "\"histogram\"},",
+            "    {\"type\": \"doubleSum\", \"name\": \"double_count_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+
+    List<ResultRow> results = seq.toList();
+    Assert.assertEquals(1, results.size());
+    // Check longSum
+    Assert.assertEquals(9L, results.get(0).get(0));
+    // Check doubleSum
+    Assert.assertEquals(9.0, (Double) results.get(0).get(1), 0.001);
+  }
+
+  @Test
+  public void testBuildingAndCountingHistogramsWithNullFilter() throws Exception
+  {
+    Sequence<ResultRow> seq = helper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogram\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"groupBy\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"dimenions\": [],",
+            "  \"aggregations\": [",
+            "    {\"type\": \"longSum\", \"name\": \"count_histogram\", \"fieldName\": "
+            + "\"histogram\"},",
+            "    {\"type\": \"doubleSum\", \"name\": \"double_count_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"],",
+            "  \"filter\": {\n",
+            "    \"fields\": [\n",
+            "      {\n",
+            "        \"field\": {\n",
+            "          \"dimension\": \"histogram\",\n",
+            "          \"value\": \"0\",\n",
+            "          \"type\": \"selector\"\n",
+            "        },\n",
+            "        \"type\": \"not\"\n",
+            "      },\n",
+            "      {\n",
+            "        \"field\": {\n",
+            "          \"dimension\": \"histogram\",\n",
+            "          \"value\": \"\",\n",
+            "          \"type\": \"selector\"\n",
+            "        },\n",
+            "        \"type\": \"not\"\n",
+            "      }\n",
+            "    ],\n",
+            "    \"type\": \"and\"\n",
+            "  }",
+            "}"
+        )
+    );
+
+    List<ResultRow> results = seq.toList();
+    Assert.assertEquals(1, results.size());
+    // Check longSum
+    Assert.assertEquals(9L, results.get(0).get(0));
+    // Check doubleSum
+    Assert.assertEquals(9.0, (Double) results.get(0).get(1), 0.001);
+  }
+
+  @Test
+  public void testIngestAsHistogramDistribution() throws Exception
+  {
+    Sequence<ResultRow> seq = helper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogramDistribution\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"groupBy\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"dimenions\": [],",
+            "  \"aggregations\": [",
+            "    {\"type\": \"spectatorHistogram\", \"name\": \"merged_cost_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+    SpectatorHistogram expected = new SpectatorHistogram();
+    expected.add(PercentileBuckets.indexOf(10), 1L);
+    expected.add(PercentileBuckets.indexOf(30), 1L);
+    expected.add(PercentileBuckets.indexOf(40), 3L);
+    expected.add(PercentileBuckets.indexOf(50), 3L);
+    expected.add(PercentileBuckets.indexOf(20000), 1L);
+
+    List<ResultRow> results = seq.toList();
+    Assert.assertEquals(1, results.size());
+    Assert.assertEquals(expected, results.get(0).get(0));
+  }
+
+  @Test
+  public void testIngestHistogramsTimer() throws Exception
+  {
+    Sequence<ResultRow> seq = helper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogramTimer\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"groupBy\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"dimenions\": [],",
+            "  \"aggregations\": [",
+            "    {\"type\": \"spectatorHistogram\", \"name\": \"merged_cost_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+    SpectatorHistogram expected = new SpectatorHistogram();
+    expected.add(PercentileBuckets.indexOf(10), 1L);
+    expected.add(PercentileBuckets.indexOf(30), 1L);
+    expected.add(PercentileBuckets.indexOf(40), 3L);
+    expected.add(PercentileBuckets.indexOf(50), 3L);
+    expected.add(PercentileBuckets.indexOf(20000), 1L);
+
+    List<ResultRow> results = seq.toList();
+    Assert.assertEquals(1, results.size());
+    Assert.assertEquals(expected, results.get(0).get(0));
+  }
+
+  @Test
+  public void testIngestingPreaggregatedHistograms() throws Exception
+  {
+    Object rawseq = timeSeriesHelper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("pre_agg_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogram\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"timeseries\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"aggregations\": [",
+            "    {\"type\": \"spectatorHistogram\", \"name\": \"merged_cost_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+    SpectatorHistogram expected = new SpectatorHistogram();
+    expected.add(PercentileBuckets.indexOf(10), 1L);
+    expected.add(PercentileBuckets.indexOf(30), 1L);
+    expected.add(PercentileBuckets.indexOf(40), 3L);
+    expected.add(PercentileBuckets.indexOf(50), 3L);
+    expected.add(PercentileBuckets.indexOf(20000), 1L);
+
+    Sequence<Result<TimeseriesResultValue>> seq = (Sequence<Result<TimeseriesResultValue>>) rawseq;
+    List<Result<TimeseriesResultValue>> results = seq.toList();
+    Assert.assertEquals(1, results.size());
+    SpectatorHistogram value = (SpectatorHistogram) results.get(0).getValue().getMetric("merged_cost_histogram");
+    Assert.assertEquals(expected, value);
+  }
+
+  @Test
+  public void testMetadataQueryTimer() throws Exception
+  {
+    File segmentDir = tempFolder.newFolder();
+    helper.createIndex(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogramTimer\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        segmentDir,
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        true
+    );
+
+    ObjectMapper mapper = (ObjectMapper) TestHelper.makeJsonMapper();
+    SpectatorHistogramModule module = new SpectatorHistogramModule();
+    module.getJacksonModules().forEach(mod -> mapper.registerModule(mod));
+    IndexIO indexIO = new IndexIO(
+        mapper,
+        new ColumnConfig() {}
+    );
+
+    QueryableIndex index = indexIO.loadIndex(segmentDir);
+
+    SegmentId segmentId = SegmentId.dummy("segmentId");
+    QueryRunner runner = QueryRunnerTestHelper.makeQueryRunner(
+        METADATA_QR_FACTORY,
+        segmentId,
+        new QueryableIndexSegment(index, segmentId),
+        null
+    );
+
+    SegmentMetadataQuery segmentMetadataQuery = Druids.newSegmentMetadataQueryBuilder()
+                                                      .dataSource("test_datasource")
+                                                      .intervals("2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z")
+                                                      .merge(true)
+                                                      .build();
+    List<SegmentAnalysis> results = runner.run(QueryPlus.wrap(segmentMetadataQuery)).toList();
+    System.out.println(results);
+    Assert.assertEquals(1, results.size());
+    Map<String, ColumnAnalysis> columns = results.get(0).getColumns();
+    Assert.assertNotNull(columns.get("histogram"));
+    Assert.assertEquals("spectatorHistogramTimer", columns.get("histogram").getType());
+  }
+
+  @Test
+  public void testMetadataQueryDistribution() throws Exception
+  {
+    File segmentDir = tempFolder.newFolder();
+    helper.createIndex(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogramDistribution\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        segmentDir,
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        true
+    );
+
+    ObjectMapper mapper = (ObjectMapper) TestHelper.makeJsonMapper();
+    SpectatorHistogramModule module = new SpectatorHistogramModule();
+    module.getJacksonModules().forEach(mod -> mapper.registerModule(mod));
+    IndexIO indexIO = new IndexIO(
+        mapper,
+        new ColumnConfig() { }
+    );
+
+    QueryableIndex index = indexIO.loadIndex(segmentDir);
+
+    SegmentId segmentId = SegmentId.dummy("segmentId");
+    QueryRunner runner = QueryRunnerTestHelper.makeQueryRunner(
+        METADATA_QR_FACTORY,
+        segmentId,
+        new QueryableIndexSegment(index, segmentId),
+        null
+    );
+
+    SegmentMetadataQuery segmentMetadataQuery = Druids.newSegmentMetadataQueryBuilder()
+                                                      .dataSource("test_datasource")
+                                                      .intervals("2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z")
+                                                      .merge(true)
+                                                      .build();
+    List<SegmentAnalysis> results = runner.run(QueryPlus.wrap(segmentMetadataQuery)).toList();
+    System.out.println(results);
+    Assert.assertEquals(1, results.size());
+    Map<String, ColumnAnalysis> columns = results.get(0).getColumns();
+    Assert.assertNotNull(columns.get("histogram"));
+    Assert.assertEquals("spectatorHistogramDistribution", columns.get("histogram").getType());

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [ColumnAnalysis.getType](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/5966)



##########
extensions-contrib/spectator-histogram/src/test/java/org/apache/druid/spectator/histogram/SpectatorHistogramAggregatorTest.java:
##########
@@ -0,0 +1,733 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.netflix.spectator.api.histogram.PercentileBuckets;
+import org.apache.druid.jackson.DefaultObjectMapper;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.query.Druids;
+import org.apache.druid.query.QueryPlus;
+import org.apache.druid.query.QueryRunner;
+import org.apache.druid.query.QueryRunnerTestHelper;
+import org.apache.druid.query.Result;
+import org.apache.druid.query.aggregation.AggregationTestHelper;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.AggregatorUtil;
+import org.apache.druid.query.groupby.GroupByQueryConfig;
+import org.apache.druid.query.groupby.GroupByQueryRunnerTest;
+import org.apache.druid.query.groupby.ResultRow;
+import org.apache.druid.query.metadata.SegmentMetadataQueryConfig;
+import org.apache.druid.query.metadata.SegmentMetadataQueryQueryToolChest;
+import org.apache.druid.query.metadata.SegmentMetadataQueryRunnerFactory;
+import org.apache.druid.query.metadata.metadata.ColumnAnalysis;
+import org.apache.druid.query.metadata.metadata.SegmentAnalysis;
+import org.apache.druid.query.metadata.metadata.SegmentMetadataQuery;
+import org.apache.druid.query.timeseries.TimeseriesResultValue;
+import org.apache.druid.segment.IndexIO;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.QueryableIndexSegment;
+import org.apache.druid.segment.TestHelper;
+import org.apache.druid.segment.column.ColumnConfig;
+import org.apache.druid.testing.InitializedNullHandlingTest;
+import org.apache.druid.timeline.SegmentId;
+import org.junit.Assert;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+@RunWith(Parameterized.class)
+public class SpectatorHistogramAggregatorTest extends InitializedNullHandlingTest
+{
+  public static final String INPUT_DATA_PARSE_SPEC = String.join(
+      "\n",
+      "{",
+      "  \"type\": \"string\",",
+      "  \"parseSpec\": {",
+      "    \"format\": \"tsv\",",
+      "    \"timestampSpec\": {\"column\": \"timestamp\", \"format\": \"yyyyMMddHH\"},",
+      "    \"dimensionsSpec\": {",
+      "      \"dimensions\": [\"product\"],",
+      "      \"dimensionExclusions\": [],",
+      "      \"spatialDimensions\": []",
+      "    },",
+      "    \"columns\": [\"timestamp\", \"product\", \"cost\"]",
+      "  }",
+      "}"
+  );
+  @Rule
+  public final TemporaryFolder tempFolder = new TemporaryFolder();
+
+  private static final SegmentMetadataQueryRunnerFactory METADATA_QR_FACTORY = new SegmentMetadataQueryRunnerFactory(
+      new SegmentMetadataQueryQueryToolChest(new SegmentMetadataQueryConfig()),
+      QueryRunnerTestHelper.NOOP_QUERYWATCHER
+  );
+  private static final Map<String, SpectatorHistogram> EXPECTED_HISTOGRAMS = new HashMap<>();
+
+  static {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 1L);
+    EXPECTED_HISTOGRAMS.put("A", histogram);
+
+    histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(30 + 40 + 40 + 40 + 50 + 50), 1L);
+    EXPECTED_HISTOGRAMS.put("B", histogram);
+
+    histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(50 + 20000), 1L);
+    EXPECTED_HISTOGRAMS.put("C", histogram);
+  }
+
+  private final AggregationTestHelper helper;
+  private final AggregationTestHelper timeSeriesHelper;
+
+  public SpectatorHistogramAggregatorTest(final GroupByQueryConfig config)
+  {
+    SpectatorHistogramModule.registerSerde();
+    SpectatorHistogramModule module = new SpectatorHistogramModule();
+    helper = AggregationTestHelper.createGroupByQueryAggregationTestHelper(
+        module.getJacksonModules(), config, tempFolder);
+    timeSeriesHelper = AggregationTestHelper.createTimeseriesQueryAggregationTestHelper(
+        module.getJacksonModules(),
+        tempFolder
+    );
+  }
+
+  @Parameterized.Parameters(name = "{0}")
+  public static Collection<?> constructorFeeder()
+  {
+    final List<Object[]> constructors = new ArrayList<>();
+    for (GroupByQueryConfig config : GroupByQueryRunnerTest.testConfigs()) {
+      constructors.add(new Object[]{config});
+    }
+    return constructors;
+  }
+
+  // this is to test Json properties and equals
+  @Test
+  public void serializeDeserializeFactoryWithFieldName() throws Exception
+  {
+    ObjectMapper objectMapper = new DefaultObjectMapper();
+    new SpectatorHistogramModule().getJacksonModules().forEach(objectMapper::registerModule);
+    SpectatorHistogramAggregatorFactory factory = new SpectatorHistogramAggregatorFactory(
+        "name",
+        "filedName",
+        AggregatorUtil.SPECTATOR_HISTOGRAM_CACHE_TYPE_ID
+    );
+    AggregatorFactory other = objectMapper.readValue(
+        objectMapper.writeValueAsString(factory),
+        AggregatorFactory.class
+    );
+
+    Assert.assertEquals(factory, other);
+  }
+
+  @Test
+  public void testBuildingHistogramQueryTime() throws Exception
+  {
+    Sequence<ResultRow> seq = helper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"longSum\", \"name\": \"cost_sum\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"groupBy\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"dimensions\": [\"product\"],",
+            "  \"aggregations\": [",
+            "    {\"type\": \"spectatorHistogram\", \"name\": \"cost_histogram\", \"fieldName\": "
+            + "\"cost_sum\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+    List<ResultRow> results = seq.toList();
+    assertResultsMatch(results, 0, "A");
+    assertResultsMatch(results, 1, "B");
+    assertResultsMatch(results, 2, "C");
+  }
+
+  @Test
+  public void testBuildingAndMergingHistograms() throws Exception
+  {
+    Sequence<ResultRow> seq = helper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogram\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"groupBy\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"dimenions\": [],",
+            "  \"aggregations\": [",
+            "    {\"type\": \"spectatorHistogram\", \"name\": \"merged_cost_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+    SpectatorHistogram expected = new SpectatorHistogram();
+    expected.add(PercentileBuckets.indexOf(10), 1L);
+    expected.add(PercentileBuckets.indexOf(30), 1L);
+    expected.add(PercentileBuckets.indexOf(40), 3L);
+    expected.add(PercentileBuckets.indexOf(50), 3L);
+    expected.add(PercentileBuckets.indexOf(20000), 1L);
+
+    List<ResultRow> results = seq.toList();
+    Assert.assertEquals(1, results.size());
+    Assert.assertEquals(expected, results.get(0).get(0));
+  }
+
+  @Test
+  public void testBuildingAndMergingHistogramsTimeseriesQuery() throws Exception
+  {
+    Object rawseq = timeSeriesHelper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogram\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"timeseries\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"aggregations\": [",
+            "    {\"type\": \"spectatorHistogram\", \"name\": \"merged_cost_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+    SpectatorHistogram expected = new SpectatorHistogram();
+    expected.add(PercentileBuckets.indexOf(10), 1L);
+    expected.add(PercentileBuckets.indexOf(30), 1L);
+    expected.add(PercentileBuckets.indexOf(40), 3L);
+    expected.add(PercentileBuckets.indexOf(50), 3L);
+    expected.add(PercentileBuckets.indexOf(20000), 1L);
+
+    Sequence<Result<TimeseriesResultValue>> seq = (Sequence<Result<TimeseriesResultValue>>) rawseq;
+    List<Result<TimeseriesResultValue>> results = seq.toList();
+    Assert.assertEquals(1, results.size());
+    SpectatorHistogram value = (SpectatorHistogram) results.get(0).getValue().getMetric("merged_cost_histogram");
+    Assert.assertEquals(expected, value);
+  }
+
+  @Test
+  public void testBuildingAndMergingGroupbyHistograms() throws Exception
+  {
+    Sequence<ResultRow> seq = helper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogram\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"groupBy\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"dimensions\": [\"product\"],",
+            "  \"aggregations\": [",
+            "    {\"type\": \"spectatorHistogram\", \"name\": \"merged_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+
+    List<ResultRow> results = seq.toList();
+    Assert.assertEquals(6, results.size());
+
+    SpectatorHistogram expectedA = new SpectatorHistogram();
+    expectedA.add(PercentileBuckets.indexOf(10), 1L);
+    Assert.assertEquals(expectedA, results.get(0).get(1));
+
+    SpectatorHistogram expectedB = new SpectatorHistogram();
+    expectedB.add(PercentileBuckets.indexOf(30), 1L);
+    expectedB.add(PercentileBuckets.indexOf(40), 3L);
+    expectedB.add(PercentileBuckets.indexOf(50), 2L);
+    Assert.assertEquals(expectedB, results.get(1).get(1));
+
+    SpectatorHistogram expectedC = new SpectatorHistogram();
+    expectedC.add(PercentileBuckets.indexOf(50), 1L);
+    expectedC.add(PercentileBuckets.indexOf(20000), 1L);
+    Assert.assertEquals(expectedC, results.get(2).get(1));
+
+    Assert.assertNull(results.get(3).get(1));
+    Assert.assertNull(results.get(4).get(1));
+    Assert.assertNull(results.get(5).get(1));
+  }
+
+  @Test
+  public void testBuildingAndCountingHistograms() throws Exception
+  {
+    Sequence<ResultRow> seq = helper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogram\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"groupBy\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"dimenions\": [],",
+            "  \"aggregations\": [",
+            "    {\"type\": \"longSum\", \"name\": \"count_histogram\", \"fieldName\": "
+            + "\"histogram\"},",
+            "    {\"type\": \"doubleSum\", \"name\": \"double_count_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+
+    List<ResultRow> results = seq.toList();
+    Assert.assertEquals(1, results.size());
+    // Check longSum
+    Assert.assertEquals(9L, results.get(0).get(0));
+    // Check doubleSum
+    Assert.assertEquals(9.0, (Double) results.get(0).get(1), 0.001);
+  }
+
+  @Test
+  public void testBuildingAndCountingHistogramsWithNullFilter() throws Exception
+  {
+    Sequence<ResultRow> seq = helper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogram\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"groupBy\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"dimenions\": [],",
+            "  \"aggregations\": [",
+            "    {\"type\": \"longSum\", \"name\": \"count_histogram\", \"fieldName\": "
+            + "\"histogram\"},",
+            "    {\"type\": \"doubleSum\", \"name\": \"double_count_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"],",
+            "  \"filter\": {\n",
+            "    \"fields\": [\n",
+            "      {\n",
+            "        \"field\": {\n",
+            "          \"dimension\": \"histogram\",\n",
+            "          \"value\": \"0\",\n",
+            "          \"type\": \"selector\"\n",
+            "        },\n",
+            "        \"type\": \"not\"\n",
+            "      },\n",
+            "      {\n",
+            "        \"field\": {\n",
+            "          \"dimension\": \"histogram\",\n",
+            "          \"value\": \"\",\n",
+            "          \"type\": \"selector\"\n",
+            "        },\n",
+            "        \"type\": \"not\"\n",
+            "      }\n",
+            "    ],\n",
+            "    \"type\": \"and\"\n",
+            "  }",
+            "}"
+        )
+    );
+
+    List<ResultRow> results = seq.toList();
+    Assert.assertEquals(1, results.size());
+    // Check longSum
+    Assert.assertEquals(9L, results.get(0).get(0));
+    // Check doubleSum
+    Assert.assertEquals(9.0, (Double) results.get(0).get(1), 0.001);
+  }
+
+  @Test
+  public void testIngestAsHistogramDistribution() throws Exception
+  {
+    Sequence<ResultRow> seq = helper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogramDistribution\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"groupBy\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"dimenions\": [],",
+            "  \"aggregations\": [",
+            "    {\"type\": \"spectatorHistogram\", \"name\": \"merged_cost_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+    SpectatorHistogram expected = new SpectatorHistogram();
+    expected.add(PercentileBuckets.indexOf(10), 1L);
+    expected.add(PercentileBuckets.indexOf(30), 1L);
+    expected.add(PercentileBuckets.indexOf(40), 3L);
+    expected.add(PercentileBuckets.indexOf(50), 3L);
+    expected.add(PercentileBuckets.indexOf(20000), 1L);
+
+    List<ResultRow> results = seq.toList();
+    Assert.assertEquals(1, results.size());
+    Assert.assertEquals(expected, results.get(0).get(0));
+  }
+
+  @Test
+  public void testIngestHistogramsTimer() throws Exception
+  {
+    Sequence<ResultRow> seq = helper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogramTimer\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"groupBy\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"dimenions\": [],",
+            "  \"aggregations\": [",
+            "    {\"type\": \"spectatorHistogram\", \"name\": \"merged_cost_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+    SpectatorHistogram expected = new SpectatorHistogram();
+    expected.add(PercentileBuckets.indexOf(10), 1L);
+    expected.add(PercentileBuckets.indexOf(30), 1L);
+    expected.add(PercentileBuckets.indexOf(40), 3L);
+    expected.add(PercentileBuckets.indexOf(50), 3L);
+    expected.add(PercentileBuckets.indexOf(20000), 1L);
+
+    List<ResultRow> results = seq.toList();
+    Assert.assertEquals(1, results.size());
+    Assert.assertEquals(expected, results.get(0).get(0));
+  }
+
+  @Test
+  public void testIngestingPreaggregatedHistograms() throws Exception
+  {
+    Object rawseq = timeSeriesHelper.createIndexAndRunQueryOnSegment(
+        new File(this.getClass().getClassLoader().getResource("pre_agg_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogram\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        String.join(
+            "\n",
+            "{",
+            "  \"queryType\": \"timeseries\",",
+            "  \"dataSource\": \"test_datasource\",",
+            "  \"granularity\": \"ALL\",",
+            "  \"aggregations\": [",
+            "    {\"type\": \"spectatorHistogram\", \"name\": \"merged_cost_histogram\", \"fieldName\": "
+            + "\"histogram\"}",
+            "  ],",
+            "  \"intervals\": [\"2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z\"]",
+            "}"
+        )
+    );
+    SpectatorHistogram expected = new SpectatorHistogram();
+    expected.add(PercentileBuckets.indexOf(10), 1L);
+    expected.add(PercentileBuckets.indexOf(30), 1L);
+    expected.add(PercentileBuckets.indexOf(40), 3L);
+    expected.add(PercentileBuckets.indexOf(50), 3L);
+    expected.add(PercentileBuckets.indexOf(20000), 1L);
+
+    Sequence<Result<TimeseriesResultValue>> seq = (Sequence<Result<TimeseriesResultValue>>) rawseq;
+    List<Result<TimeseriesResultValue>> results = seq.toList();
+    Assert.assertEquals(1, results.size());
+    SpectatorHistogram value = (SpectatorHistogram) results.get(0).getValue().getMetric("merged_cost_histogram");
+    Assert.assertEquals(expected, value);
+  }
+
+  @Test
+  public void testMetadataQueryTimer() throws Exception
+  {
+    File segmentDir = tempFolder.newFolder();
+    helper.createIndex(
+        new File(this.getClass().getClassLoader().getResource("input_data.tsv").getFile()),
+        INPUT_DATA_PARSE_SPEC,
+        String.join(
+            "\n",
+            "[",
+            "  {\"type\": \"spectatorHistogramTimer\", \"name\": \"histogram\", \"fieldName\": \"cost\"}",
+            "]"
+        ),
+        segmentDir,
+        0, // minTimestamp
+        Granularities.NONE,
+        10, // maxRowCount
+        true
+    );
+
+    ObjectMapper mapper = (ObjectMapper) TestHelper.makeJsonMapper();
+    SpectatorHistogramModule module = new SpectatorHistogramModule();
+    module.getJacksonModules().forEach(mod -> mapper.registerModule(mod));
+    IndexIO indexIO = new IndexIO(
+        mapper,
+        new ColumnConfig() {}
+    );
+
+    QueryableIndex index = indexIO.loadIndex(segmentDir);
+
+    SegmentId segmentId = SegmentId.dummy("segmentId");
+    QueryRunner runner = QueryRunnerTestHelper.makeQueryRunner(
+        METADATA_QR_FACTORY,
+        segmentId,
+        new QueryableIndexSegment(index, segmentId),
+        null
+    );
+
+    SegmentMetadataQuery segmentMetadataQuery = Druids.newSegmentMetadataQueryBuilder()
+                                                      .dataSource("test_datasource")
+                                                      .intervals("2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z")
+                                                      .merge(true)
+                                                      .build();
+    List<SegmentAnalysis> results = runner.run(QueryPlus.wrap(segmentMetadataQuery)).toList();
+    System.out.println(results);
+    Assert.assertEquals(1, results.size());
+    Map<String, ColumnAnalysis> columns = results.get(0).getColumns();
+    Assert.assertNotNull(columns.get("histogram"));
+    Assert.assertEquals("spectatorHistogramTimer", columns.get("histogram").getType());

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [ColumnAnalysis.getType](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/5965)



##########
extensions-contrib/spectator-histogram/src/test/java/org/apache/druid/spectator/histogram/SpectatorHistogramTest.java:
##########
@@ -0,0 +1,451 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.netflix.spectator.api.histogram.PercentileBuckets;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.writeout.OnHeapMemorySegmentWriteOutMedium;
+import org.apache.druid.segment.writeout.SegmentWriteOutMedium;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+
+public class SpectatorHistogramTest
+{
+  @Test
+  public void testToBytesSmallValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.insert(10);
+    histogram.insert(30);
+    histogram.insert(40);
+    histogram.insert(40);
+    histogram.insert(40);
+    histogram.insert(50);
+    histogram.insert(50);
+    // Check the full range of bucket IDs still work
+    long bigValue = PercentileBuckets.get(270);
+    histogram.insert(bigValue);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 8, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = 0;
+    Assert.assertEquals("Should compact small values within key bytes", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(3L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(2L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(bigValue)));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 8, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesSmallishValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 64L);
+    histogram.add(PercentileBuckets.indexOf(30), 127L);
+    histogram.add(PercentileBuckets.indexOf(40), 111L);
+    histogram.add(PercentileBuckets.indexOf(50), 99L);
+    histogram.add(270, 100L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 501, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Byte.BYTES;
+    Assert.assertEquals("Should compact small values to a byte", 5 * (keySize + valSize), bytes.length);

Review Comment:
   ## Result of multiplication cast to wider type
   
   Potential overflow in [int multiplication](1) before it is converted to long by use in an invocation context.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/5968)



##########
extensions-contrib/spectator-histogram/src/test/java/org/apache/druid/spectator/histogram/SpectatorHistogramTest.java:
##########
@@ -0,0 +1,451 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.netflix.spectator.api.histogram.PercentileBuckets;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.writeout.OnHeapMemorySegmentWriteOutMedium;
+import org.apache.druid.segment.writeout.SegmentWriteOutMedium;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+
+public class SpectatorHistogramTest
+{
+  @Test
+  public void testToBytesSmallValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.insert(10);
+    histogram.insert(30);
+    histogram.insert(40);
+    histogram.insert(40);
+    histogram.insert(40);
+    histogram.insert(50);
+    histogram.insert(50);
+    // Check the full range of bucket IDs still work
+    long bigValue = PercentileBuckets.get(270);
+    histogram.insert(bigValue);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 8, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = 0;
+    Assert.assertEquals("Should compact small values within key bytes", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(3L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(2L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(bigValue)));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 8, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesSmallishValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 64L);
+    histogram.add(PercentileBuckets.indexOf(30), 127L);
+    histogram.add(PercentileBuckets.indexOf(40), 111L);
+    histogram.add(PercentileBuckets.indexOf(50), 99L);
+    histogram.add(270, 100L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 501, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Byte.BYTES;
+    Assert.assertEquals("Should compact small values to a byte", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(64L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(127L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(111L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(99L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(100L, deserialized.get(270));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 501, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesMedValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 512L);
+    histogram.add(PercentileBuckets.indexOf(30), 1024L);
+    histogram.add(PercentileBuckets.indexOf(40), 2048L);
+    histogram.add(PercentileBuckets.indexOf(50), 4096L);
+    histogram.add(270, 8192L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 15872, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Short.BYTES;
+    Assert.assertEquals("Should compact medium values to short", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(512L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(1024L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(2048L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(4096L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(8192L, deserialized.get(270));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 15872, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesLargerValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 100000L);
+    histogram.add(PercentileBuckets.indexOf(30), 200000L);
+    histogram.add(PercentileBuckets.indexOf(40), 500000L);
+    histogram.add(PercentileBuckets.indexOf(50), 10000000L);
+    histogram.add(270, 50000000L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 60800000, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Integer.BYTES;
+    Assert.assertEquals("Should compact larger values to integer", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(100000L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(200000L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(500000L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(10000000L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(50000000L, deserialized.get(270));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 60800000, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesBiggestValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 10000000000L);
+    histogram.add(PercentileBuckets.indexOf(30), 20000000000L);
+    histogram.add(PercentileBuckets.indexOf(40), 50000000000L);
+    histogram.add(PercentileBuckets.indexOf(50), 100000000000L);
+    histogram.add(270, 5000000000000L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 5180000000000L, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Long.BYTES;
+    Assert.assertEquals("Should not compact larger values", 5 * (keySize + valSize), bytes.length);

Review Comment:
   ## Result of multiplication cast to wider type
   
   Potential overflow in [int multiplication](1) before it is converted to long by use in an invocation context.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/5971)



##########
extensions-contrib/spectator-histogram/src/main/java/org/apache/druid/spectator/histogram/SpectatorHistogram.java:
##########
@@ -0,0 +1,423 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.SerializerProvider;
+import com.netflix.spectator.api.histogram.PercentileBuckets;
+import it.unimi.dsi.fastutil.shorts.Short2LongMap;
+import it.unimi.dsi.fastutil.shorts.Short2LongMaps;
+import it.unimi.dsi.fastutil.shorts.Short2LongOpenHashMap;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.java.util.common.jackson.JacksonUtils;
+import org.apache.druid.java.util.common.parsers.ParseException;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Objects;
+
+// Since queries don't come from SpectatorHistogramAggregator in the case of
+// using longSum or doubleSum aggregations. They come from LongSumBufferAggregator.
+// Therefore, we extended Number here.
+// This will prevent class casting exceptions if trying to query with sum rather
+// than explicitly as a SpectatorHistogram
+//
+// The SpectatorHistogram is a Number. That number is of intValue(),
+// which is the count of the number of events in the histogram
+// (adding up the counts across all buckets).
+//
+// There are a few useful aggregators, which as Druid Native Queries use:
+// type: "longSum" - Aggregates and returns the number of events in the histogram.
+// i.e. the sum of all bucket counts.
+// type: "spectatorHistogramDistribution" - Aggregates and returns a map (bucketIndex -> bucketCount)
+// representing a SpectatorHistogram. The represented data is a distribution.
+// type: "spectatorHistogramTimer" - Aggregates and returns a map (bucketIndex -> bucketCount)
+// representing a SpectatorHistogram. The represented data is measuring time.
+public class SpectatorHistogram extends Number
+{
+  private static final int MAX_ENTRY_BYTES = Short.BYTES + Long.BYTES;
+  private static final int LOW_COUNT_FLAG = 0x0200;
+  private static final int BYTE_VALUE = 0x8000;
+  private static final int SHORT_VALUE = 0x4000;
+  private static final int INT_VALUE = 0xC000;
+  private static final int VALUE_SIZE_MASK = 0xFC00;
+  private static final int KEY_MASK = 0x01FF;
+
+  private static final ObjectMapper JSON_MAPPER = new ObjectMapper();
+
+  // Values are packed into few bytes depending on the size of the counts
+  // The bucket index falls in the range 0-276, so we need 9 bits for the bucket index.
+  // Counts can range from 1 to Long.MAX_VALUE, so we need 1 to 64 bits for the value.
+  // To optimize storage, we use the remaining top 7 bits of the bucket index short to
+  // encode the storage type for the count value.
+  // AAbb bbYx xxxx xxxx
+  //        |          +-- 9 bits - The bucket index
+  //        +------------- 1 bit  - Low-count flag, set if count <= 63
+  // ++++ ++-------------- 6 bits - If low-count flag is set,
+  //                                 The count value, zero extra bytes used.
+  //                                If low-count flag is not set,
+  //                                 The value length indicator as encoded below
+  // ++------------------- 2 bits - 00 = 8 bytes used for value
+  //                                10 = 1 byte used for value
+  //                                01 = 2 bytes used for value
+  //                                11 = 4 bytes used for value
+  //
+  // Example:
+  // ------------------------------------------------------------------------------------------
+  // Consider the histogram: [10, 30, 40x3, 50x2, 100x256]
+  // That is there is one value of 10, and 3 values of 40, etc. As shown in the table below:
+  //
+  // Bucket Index | Bucket Range | Bucket Count
+  //    10        |   [10,11)    |     1
+  //    17        |   [26,31)    |     1
+  //    19        |   [36,41)    |     3
+  //    21        |   [46,51)    |     2
+  //    25        |  [85,106)    |   256
+  //
+  // See com.netflix.spectator.api.histogram.PercentileBuckets
+  // for an explaination of how the bucket index is assigned
+  // to each of the values: (10, 17, 19, 21, 25).
+  //
+  // Based on the specification above the histogram is serialized into a
+  // byte array to minimize storage size:
+  // In Base 10: [64, 25, 1, 0, 6, 10, 6, 17, 14, 19, 10, 21]
+  // In Binary: [01000000, 00011001, 00000001, 00000000, 00000110, 00001010,
+  //             00000110, 00010001, 00001110, 00010011, 00001010, 00010101]
+  //
+  // Each groups of bits (which varies in length), represent a histogram bucket index and count
+  // 01000000000110010000000100000000
+  // 01 - Since the low count bit is NOT set, leading 2 bits 01 indicates that the bucket count
+  //      value is encoded in 2 bytes.
+  // 0000 - Since the low count bit is Not set these bits are unused, the bucket count will
+  //        be encoded in an additional two bytes.
+  // 0 - Low count bit is NOT set
+  // 000011001 - These 9 bits represent the bucket index of 25
+  // 0000000100000000 - These 16 bits represent the bucket count of 256
+  //
+  // 0000011000001010
+  // 000001 - Low count bit IS set, so these 6-bits represent a bucket count of 1
+  // 1 - Low count bit IS set
+  // 000001010 - These 9 bits represent the bucket index of 10
+  //
+  // 0000011000010001
+  // 000001 - Bucket count of 1
+  // 1 - Low count bit IS set
+  // 000010001 - Bucket index of 17
+  //
+  // 0000111000010011
+  // 000011 - Bucket count of 3
+  // 1 - Low count bit IS set
+  // 000010011 - Bucket index of 19
+  //
+  // 0000101000010101
+  // 000010 - Bucket count of 2
+  // 1 - Low count bit IS set
+  // 000010101 - Bucket index of 21
+  // ------------------------------------------------------------------------------------------
+  private Short2LongOpenHashMap backingMap;
+
+  // The sum of counts in the histogram.
+  // These are accumulated when an entry is added, or when another histogram is merged into this one.
+  private long sumOfCounts = 0;
+
+  static int getMaxIntermdiateHistogramSize()
+  {
+    return PercentileBuckets.length() * MAX_ENTRY_BYTES;
+  }
+
+  @Nullable
+  static SpectatorHistogram deserialize(Object serializedHistogram)
+  {
+    if (serializedHistogram == null) {
+      return null;
+    }
+    if (serializedHistogram instanceof byte[]) {
+      return fromByteBuffer(ByteBuffer.wrap((byte[]) serializedHistogram));
+    }
+    if (serializedHistogram instanceof SpectatorHistogram) {
+      return (SpectatorHistogram) serializedHistogram;
+    }
+    if (serializedHistogram instanceof String) {
+      // Try parse as JSON into HashMap
+      try {
+        HashMap<String, Long> map = JSON_MAPPER.readerFor(HashMap.class).readValue((String) serializedHistogram);
+        SpectatorHistogram histogram = new SpectatorHistogram();
+        for (Map.Entry<String, Long> entry : map.entrySet()) {
+          histogram.add(entry.getKey(), entry.getValue());
+        }
+        return histogram;
+      }
+      catch (JsonProcessingException e) {
+        throw new ParseException((String) serializedHistogram, e, "String cannot be deserialized as JSON to a Spectator Histogram");
+      }
+    }
+    if (serializedHistogram instanceof HashMap) {
+      SpectatorHistogram histogram = new SpectatorHistogram();
+      for (Map.Entry<?, ?> entry : ((HashMap<?, ?>) serializedHistogram).entrySet()) {
+        histogram.add(entry.getKey(), (Number) entry.getValue());
+      }
+      return histogram;
+    }
+    throw new ParseException(
+        null,
+        "Object cannot be deserialized to a Spectator Histogram "
+        + serializedHistogram.getClass()
+    );
+  }
+
+  @Nullable
+  static SpectatorHistogram fromByteBuffer(ByteBuffer buffer)
+  {
+    if (buffer == null || !buffer.hasRemaining()) {
+      return null;
+    }
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    while (buffer.hasRemaining()) {
+      short key = buffer.getShort();
+      short idx = (short) (key & KEY_MASK);
+      long val;
+      if ((key & LOW_COUNT_FLAG) == LOW_COUNT_FLAG) {
+        // Value/count is encoded in the top 6 bits of the short
+        val = (key & VALUE_SIZE_MASK) >>> 10;
+      } else {
+        switch (key & VALUE_SIZE_MASK) {
+          case BYTE_VALUE:
+            val = buffer.get() & 0xFF;
+            break;
+
+          case SHORT_VALUE:
+            val = buffer.getShort() & 0xFFFF;
+            break;
+
+          case INT_VALUE:
+            val = buffer.getInt() & 0xFFFFFFFFL;
+            break;
+
+          default:
+            val = buffer.getLong();
+            break;
+        }
+      }
+
+      histogram.add(idx, val);
+    }
+    if (histogram.isEmpty()) {
+      return null;
+    }
+    return histogram;
+  }
+
+  private Short2LongOpenHashMap writableMap()
+  {
+    if (backingMap == null) {
+      backingMap = new Short2LongOpenHashMap();
+    }
+    return backingMap;
+  }
+
+  private Short2LongMap readableMap()
+  {
+    if (isEmpty()) {
+      return Short2LongMaps.EMPTY_MAP;
+    }
+    return backingMap;
+  }
+
+  @Nullable
+  byte[] toBytes()
+  {
+    if (isEmpty()) {
+      return null;
+    }
+    ByteBuffer buffer = ByteBuffer.allocate(MAX_ENTRY_BYTES * size());
+    for (Short2LongMap.Entry e : Short2LongMaps.fastIterable(readableMap())) {
+      short key = e.getShortKey();
+      long value = e.getLongValue();
+      if (value <= 0x3F) {
+        // Value/count is encoded in the top 6 bits of the key bytes
+        buffer.putShort((short) ((key | LOW_COUNT_FLAG) | ((int) ((value << 10) & VALUE_SIZE_MASK))));
+      } else if (value <= 0xFF) {
+        buffer.putShort((short) (key | BYTE_VALUE));
+        buffer.put((byte) value);
+      } else if (value <= 0xFFFF) {
+        buffer.putShort((short) (key | SHORT_VALUE));
+        buffer.putShort((short) value);
+      } else if (value <= 0xFFFFFFFFL) {
+        buffer.putShort((short) (key | INT_VALUE));
+        buffer.putInt((int) value);
+      } else {
+        buffer.putShort(key);
+        buffer.putLong(value);
+      }
+    }
+    return Arrays.copyOf(buffer.array(), buffer.position());
+  }
+
+  void insert(Number num)
+  {
+    this.add(PercentileBuckets.indexOf(num.longValue()), 1L);
+  }
+
+  void merge(SpectatorHistogram source)
+  {
+    if (source == null) {
+      return;
+    }
+    Short2LongOpenHashMap writableMap = writableMap();
+    for (Short2LongMap.Entry entry : Short2LongMaps.fastIterable(source.readableMap())) {
+      writableMap.addTo(entry.getShortKey(), entry.getLongValue());
+      this.sumOfCounts += entry.getLongValue();
+    }
+  }
+
+  // Exposed for testing
+  void add(int bucket, long count)
+  {
+    if (bucket >= PercentileBuckets.length() || bucket < 0) {
+      throw new IAE("Bucket index out of range (0, " + PercentileBuckets.length() + ")");
+    }
+    writableMap().addTo((short) bucket, count);
+    this.sumOfCounts += count;
+  }
+
+  private void add(Object key, Number value)
+  {
+    if (key instanceof String) {
+      this.add(Integer.parseInt((String) key), value.longValue());

Review Comment:
   ## Missing catch of NumberFormatException
   
   Potential uncaught 'java.lang.NumberFormatException'.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/5963)



##########
extensions-contrib/spectator-histogram/src/test/java/org/apache/druid/spectator/histogram/SpectatorHistogramTest.java:
##########
@@ -0,0 +1,451 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.netflix.spectator.api.histogram.PercentileBuckets;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.writeout.OnHeapMemorySegmentWriteOutMedium;
+import org.apache.druid.segment.writeout.SegmentWriteOutMedium;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+
+public class SpectatorHistogramTest
+{
+  @Test
+  public void testToBytesSmallValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.insert(10);
+    histogram.insert(30);
+    histogram.insert(40);
+    histogram.insert(40);
+    histogram.insert(40);
+    histogram.insert(50);
+    histogram.insert(50);
+    // Check the full range of bucket IDs still work
+    long bigValue = PercentileBuckets.get(270);
+    histogram.insert(bigValue);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 8, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = 0;
+    Assert.assertEquals("Should compact small values within key bytes", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(3L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(2L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(bigValue)));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 8, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesSmallishValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 64L);
+    histogram.add(PercentileBuckets.indexOf(30), 127L);
+    histogram.add(PercentileBuckets.indexOf(40), 111L);
+    histogram.add(PercentileBuckets.indexOf(50), 99L);
+    histogram.add(270, 100L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 501, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Byte.BYTES;
+    Assert.assertEquals("Should compact small values to a byte", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(64L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(127L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(111L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(99L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(100L, deserialized.get(270));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 501, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesMedValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 512L);
+    histogram.add(PercentileBuckets.indexOf(30), 1024L);
+    histogram.add(PercentileBuckets.indexOf(40), 2048L);
+    histogram.add(PercentileBuckets.indexOf(50), 4096L);
+    histogram.add(270, 8192L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 15872, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Short.BYTES;
+    Assert.assertEquals("Should compact medium values to short", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(512L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(1024L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(2048L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(4096L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(8192L, deserialized.get(270));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 15872, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesLargerValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 100000L);
+    histogram.add(PercentileBuckets.indexOf(30), 200000L);
+    histogram.add(PercentileBuckets.indexOf(40), 500000L);
+    histogram.add(PercentileBuckets.indexOf(50), 10000000L);
+    histogram.add(270, 50000000L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 60800000, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Integer.BYTES;
+    Assert.assertEquals("Should compact larger values to integer", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(100000L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(200000L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(500000L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(10000000L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(50000000L, deserialized.get(270));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 60800000, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesBiggestValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 10000000000L);
+    histogram.add(PercentileBuckets.indexOf(30), 20000000000L);
+    histogram.add(PercentileBuckets.indexOf(40), 50000000000L);
+    histogram.add(PercentileBuckets.indexOf(50), 100000000000L);
+    histogram.add(270, 5000000000000L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 5180000000000L, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Long.BYTES;
+    Assert.assertEquals("Should not compact larger values", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(10000000000L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(20000000000L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(50000000000L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(100000000000L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(5000000000000L, deserialized.get(270));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 5180000000000L, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesMixedValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 1L);
+    histogram.add(PercentileBuckets.indexOf(30), 300L);
+    histogram.add(PercentileBuckets.indexOf(40), 200000L);
+    histogram.add(PercentileBuckets.indexOf(50), 100000000000L);
+    histogram.add(270, 5000000000000L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 5100000200301L, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    Assert.assertEquals("Should not compact larger values", (5 * keySize) + 0 + 2 + 4 + 8 + 8, bytes.length);

Review Comment:
   ## Result of multiplication cast to wider type
   
   Potential overflow in [int multiplication](1) before it is converted to long by use in an invocation context.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/5972)



##########
extensions-contrib/spectator-histogram/src/test/java/org/apache/druid/spectator/histogram/SpectatorHistogramTest.java:
##########
@@ -0,0 +1,451 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.netflix.spectator.api.histogram.PercentileBuckets;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.writeout.OnHeapMemorySegmentWriteOutMedium;
+import org.apache.druid.segment.writeout.SegmentWriteOutMedium;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+
+public class SpectatorHistogramTest
+{
+  @Test
+  public void testToBytesSmallValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.insert(10);
+    histogram.insert(30);
+    histogram.insert(40);
+    histogram.insert(40);
+    histogram.insert(40);
+    histogram.insert(50);
+    histogram.insert(50);
+    // Check the full range of bucket IDs still work
+    long bigValue = PercentileBuckets.get(270);
+    histogram.insert(bigValue);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 8, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = 0;
+    Assert.assertEquals("Should compact small values within key bytes", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(3L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(2L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(bigValue)));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 8, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesSmallishValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 64L);
+    histogram.add(PercentileBuckets.indexOf(30), 127L);
+    histogram.add(PercentileBuckets.indexOf(40), 111L);
+    histogram.add(PercentileBuckets.indexOf(50), 99L);
+    histogram.add(270, 100L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 501, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Byte.BYTES;
+    Assert.assertEquals("Should compact small values to a byte", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(64L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(127L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(111L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(99L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(100L, deserialized.get(270));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 501, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesMedValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 512L);
+    histogram.add(PercentileBuckets.indexOf(30), 1024L);
+    histogram.add(PercentileBuckets.indexOf(40), 2048L);
+    histogram.add(PercentileBuckets.indexOf(50), 4096L);
+    histogram.add(270, 8192L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 15872, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Short.BYTES;
+    Assert.assertEquals("Should compact medium values to short", 5 * (keySize + valSize), bytes.length);

Review Comment:
   ## Result of multiplication cast to wider type
   
   Potential overflow in [int multiplication](1) before it is converted to long by use in an invocation context.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/5969)



##########
extensions-contrib/spectator-histogram/src/test/java/org/apache/druid/spectator/histogram/SpectatorHistogramTest.java:
##########
@@ -0,0 +1,451 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.netflix.spectator.api.histogram.PercentileBuckets;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.writeout.OnHeapMemorySegmentWriteOutMedium;
+import org.apache.druid.segment.writeout.SegmentWriteOutMedium;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+
+public class SpectatorHistogramTest
+{
+  @Test
+  public void testToBytesSmallValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.insert(10);
+    histogram.insert(30);
+    histogram.insert(40);
+    histogram.insert(40);
+    histogram.insert(40);
+    histogram.insert(50);
+    histogram.insert(50);
+    // Check the full range of bucket IDs still work
+    long bigValue = PercentileBuckets.get(270);
+    histogram.insert(bigValue);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 8, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = 0;
+    Assert.assertEquals("Should compact small values within key bytes", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(3L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(2L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(bigValue)));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 8, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesSmallishValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 64L);
+    histogram.add(PercentileBuckets.indexOf(30), 127L);
+    histogram.add(PercentileBuckets.indexOf(40), 111L);
+    histogram.add(PercentileBuckets.indexOf(50), 99L);
+    histogram.add(270, 100L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 501, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Byte.BYTES;
+    Assert.assertEquals("Should compact small values to a byte", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(64L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(127L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(111L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(99L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(100L, deserialized.get(270));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 501, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesMedValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 512L);
+    histogram.add(PercentileBuckets.indexOf(30), 1024L);
+    histogram.add(PercentileBuckets.indexOf(40), 2048L);
+    histogram.add(PercentileBuckets.indexOf(50), 4096L);
+    histogram.add(270, 8192L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 15872, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Short.BYTES;
+    Assert.assertEquals("Should compact medium values to short", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(512L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(1024L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(2048L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(4096L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(8192L, deserialized.get(270));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 15872, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesLargerValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 100000L);
+    histogram.add(PercentileBuckets.indexOf(30), 200000L);
+    histogram.add(PercentileBuckets.indexOf(40), 500000L);
+    histogram.add(PercentileBuckets.indexOf(50), 10000000L);
+    histogram.add(270, 50000000L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 60800000, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Integer.BYTES;
+    Assert.assertEquals("Should compact larger values to integer", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(100000L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(200000L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(500000L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(10000000L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(50000000L, deserialized.get(270));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 60800000, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesBiggestValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 10000000000L);
+    histogram.add(PercentileBuckets.indexOf(30), 20000000000L);
+    histogram.add(PercentileBuckets.indexOf(40), 50000000000L);
+    histogram.add(PercentileBuckets.indexOf(50), 100000000000L);
+    histogram.add(270, 5000000000000L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 5180000000000L, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    int valSize = Long.BYTES;
+    Assert.assertEquals("Should not compact larger values", 5 * (keySize + valSize), bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(10000000000L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(20000000000L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(50000000000L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(100000000000L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(5000000000000L, deserialized.get(270));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 5180000000000L, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesMixedValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(PercentileBuckets.indexOf(10), 1L);
+    histogram.add(PercentileBuckets.indexOf(30), 300L);
+    histogram.add(PercentileBuckets.indexOf(40), 200000L);
+    histogram.add(PercentileBuckets.indexOf(50), 100000000000L);
+    histogram.add(270, 5000000000000L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 5100000200301L, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    Assert.assertEquals("Should not compact larger values", (5 * keySize) + 0 + 2 + 4 + 8 + 8, bytes.length);
+
+    SpectatorHistogram deserialized = SpectatorHistogram.deserialize(bytes);
+    Assert.assertEquals(1L, deserialized.get(PercentileBuckets.indexOf(10)));
+    Assert.assertEquals(300L, deserialized.get(PercentileBuckets.indexOf(30)));
+    Assert.assertEquals(200000L, deserialized.get(PercentileBuckets.indexOf(40)));
+    Assert.assertEquals(100000000000L, deserialized.get(PercentileBuckets.indexOf(50)));
+    Assert.assertEquals(5000000000000L, deserialized.get(270));
+
+    Assert.assertEquals("Should have size matching number of buckets", 5, deserialized.size());
+    Assert.assertEquals("Should have sum matching number entries", 5100000200301L, deserialized.getSum());
+  }
+
+  @Test
+  public void testToBytesBoundaryValues()
+  {
+    SpectatorHistogram histogram = new SpectatorHistogram();
+    histogram.add(6, 63L);
+    histogram.add(7, 64L);
+    histogram.add(8, 255L);
+    histogram.add(9, 256L);
+    histogram.add(16, 65535L);
+    histogram.add(17, 65536L);
+    histogram.add(32, 4294967295L);
+    histogram.add(33, 4294967296L);
+
+    Assert.assertEquals("Should have size matching number of buckets", 8, histogram.size());
+    Assert.assertEquals("Should have sum matching number entries", 8590066300L, histogram.getSum());
+
+    byte[] bytes = histogram.toBytes();
+    int keySize = Short.BYTES;
+    Assert.assertEquals("Should compact", (8 * keySize) + 0 + 1 + 1 + 2 + 2 + 4 + 4 + 8, bytes.length);

Review Comment:
   ## Result of multiplication cast to wider type
   
   Potential overflow in [int multiplication](1) before it is converted to long by use in an invocation context.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/5973)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "bsyk (via GitHub)" <gi...@apache.org>.
bsyk commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1443326531


##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,386 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive numeric values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Fixed buckets with increasing bucket widths. Relative accuracy is maintained,
+but absolute accuracy reduces with larger values.
+
+> If either of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.
+
+```json
+{ "4":  8, "5": 15, "6": 37, "7": 9, "8": 3, "10": 1, "13": 1 }
+```
+
+## Loading the extension
+To use SpectatorHistogram, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
+
+```
+druid.extensions.loadList=["druid-spectator-histogram"]
+```
+
+## Aggregators
+
+The result of the aggregation is a histogram that is built by ingesting numeric values from
+the raw data, or from combining pre-aggregated histograms. The result is represented in 
+JSON format where the keys are the bucket index and the values are the count of entries
+in that bucket.
+
+The buckets are defined as per the Spectator [PercentileBuckets](https://github.com/Netflix/spectator/blob/main/spectator-api/src/main/java/com/netflix/spectator/api/histogram/PercentileBuckets.java) specification.
+See [Appendix](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+```js
+  // The set of buckets is generated by using powers of 4 and incrementing by one-third of the
+  // previous power of 4 in between as long as the value is less than the next power of 4 minus
+  // the delta.
+  //
+  // Base: 1, 2, 3
+  //
+  // 4 (4^1), delta = 1 (~1/3 of 4)
+  //     5, 6, 7, ..., 14,
+  //
+  // 16 (4^2), delta = 5 (~1/3 of 16)
+  //    21, 26, 31, ..., 56,
+  //
+  // 64 (4^3), delta = 21 (~1/3 of 64)
+  // ...
+```
+
+There are multiple aggregator types included, all of which are based on the same
+underlying implementation. The different types signal to the Atlas-Druid service (if using)
+how to handle the resulting data from a query.
+
+* spectatorHistogramTimer signals that the histogram is representing
+a collection of timer values. It is recommended to normalize timer values to nanoseconds
+at, or prior to, ingestion. If queried via the Atlas-Druid service, it will
+normalize timers to second resolution at query time as a more natural unit of time
+for human consumption.
+* spectatorHistogram and spectatorHistogramDistribution are generic histograms that
+can be used to represent any measured value without units. No normalization is
+required or performed.
+
+### `spectatorHistogram` aggregator
+Alias: `spectatorHistogramDistribution`, `spectatorHistogramTimer`
+
+To aggregate at query time:
+```
+{
+  "type" : "spectatorHistogram",
+  "name" : <output_name>,
+  "fieldName" : <column_name>
+ }
+```
+
+| Property  | Description                                                                                                  | Required? |
+|-----------|--------------------------------------------------------------------------------------------------------------|-----------|
+| type      | This String must be one of "spectatorHistogram", "spectatorHistogramTimer", "spectatorHistogramDistribution" | yes       |
+| name      | A String for the output (result) name of the aggregation.                                                    | yes       |
+| fieldName | A String for the name of the input field containing raw numeric values or pre-aggregated histograms.         | yes       |
+
+### `longSum`, `doubleSum` and `floatSum` aggregators
+To get the population size (count of events contributing to the histogram):
+```
+{
+  "type" : "longSum",
+  "name" : <output_name>,
+  "fieldName" : <column_name_of_aggregated_histogram>
+ }
+```
+
+| Property  | Description                                                                    | Required? |
+|-----------|--------------------------------------------------------------------------------|-----------|
+| type      | Must be "longSum", "doubleSum", or "floatSum".                                 | yes       |
+| name      | A String for the output (result) name of the aggregation.                      | yes       |
+| fieldName | A String for the name of the input field containing pre-aggregated histograms. | yes       |
+
+## Post Aggregators
+
+### Percentile (singular)
+This returns a single percentile calculation based on the distribution of the values in the aggregated histogram.
+
+```
+{
+  "type": "percentileSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentile": <decimal percentile, e.g. 50.0 for median>
+}
+```
+
+| Property   | Description                                                 | Required? |
+|------------|-------------------------------------------------------------|-----------|
+| type       | This String should always be "percentileSpectatorHistogram" | yes       |
+| name       | A String for the output (result) name of the calculation.   | yes       |
+| field      | A field reference pointing to the aggregated histogram.     | yes       |
+| percentile | A single decimal percentile between 0.0 and 100.0           | yes       |
+
+### Percentiles (multiple)
+This returns an array of percentiles corresponding to those requested.
+
+```
+{
+  "type": "percentilesSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentiles": [25, 50, 75, 99.5]
+}
+```
+
+> Note: It's more efficient to request multiple percentiles in a single query

Review Comment:
   Where only a single percentile is wanted, often median or 95th, it's slightly nicer to get a single value back, rather than having to extract from an array in the results.
   Also not a strong opinion.
   
   Is the note misleading? It's trying to say, "if you want multiple percentiles from the same underlying metric, then ask for them all at once, rather than as separate metrics". 1 query being more efficient than 2.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1445383669


##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+

Review Comment:
   ```suggestion
   ```
   Combine this into the section above. Note that the `>` notation actually makes text smaller and italicized, which may get overlooked.
   
   Example: https://druid.apache.org/docs/latest/operations/durable-storage/ ("Note that only S3 is supported as a durable storage location.")



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1445441959


##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.
+
+```json
+{ "4":  8, "5": 15, "6": 37, "7": 9, "8": 3, "10": 1, "13": 1 }
+```
+
+## Loading the extension
+To use SpectatorHistogram, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
+
+```
+druid.extensions.loadList=["druid-spectator-histogram"]
+```
+
+## Aggregators
+
+The result of the aggregation is a histogram that is built by ingesting numeric values from
+the raw data, or from combining pre-aggregated histograms. The result is represented in 
+JSON format where the keys are the bucket index and the values are the count of entries
+in that bucket.
+
+The buckets are defined as per the Spectator [PercentileBuckets](https://github.com/Netflix/spectator/blob/main/spectator-api/src/main/java/com/netflix/spectator/api/histogram/PercentileBuckets.java) specification.
+See [Appendix](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+```js
+  // The set of buckets is generated by using powers of 4 and incrementing by one-third of the
+  // previous power of 4 in between as long as the value is less than the next power of 4 minus
+  // the delta.
+  //
+  // Base: 1, 2, 3
+  //
+  // 4 (4^1), delta = 1 (~1/3 of 4)
+  //     5, 6, 7, ..., 14,
+  //
+  // 16 (4^2), delta = 5 (~1/3 of 16)
+  //    21, 26, 31, ..., 56,
+  //
+  // 64 (4^3), delta = 21 (~1/3 of 64)
+  // ...
+```
+
+There are multiple aggregator types included, all of which are based on the same
+underlying implementation. The different types signal to the Atlas-Druid service (if using)
+how to handle the resulting data from a query.
+
+* spectatorHistogramTimer signals that the histogram is representing
+a collection of timer values. It is recommended to normalize timer values to nanoseconds
+at, or prior to, ingestion. If queried via the Atlas-Druid service, it will
+normalize timers to second resolution at query time as a more natural unit of time
+for human consumption.
+* spectatorHistogram and spectatorHistogramDistribution are generic histograms that
+can be used to represent any measured value without units. No normalization is
+required or performed.
+
+### `spectatorHistogram` aggregator
+Alias: `spectatorHistogramDistribution`, `spectatorHistogramTimer`
+
+To aggregate at query time:
+```
+{
+  "type" : "spectatorHistogram",
+  "name" : <output_name>,
+  "fieldName" : <column_name>
+ }
+```
+
+| Property  | Description                                                                                                  | Required? |
+|-----------|--------------------------------------------------------------------------------------------------------------|-----------|
+| type      | This String must be one of "spectatorHistogram", "spectatorHistogramTimer", "spectatorHistogramDistribution" | yes       |
+| name      | A String for the output (result) name of the aggregation.                                                    | yes       |
+| fieldName | A String for the name of the input field containing raw numeric values or pre-aggregated histograms.         | yes       |
+
+### `longSum`, `doubleSum` and `floatSum` aggregators
+To get the population size (count of events contributing to the histogram):
+```
+{
+  "type" : "longSum",
+  "name" : <output_name>,
+  "fieldName" : <column_name_of_aggregated_histogram>
+ }
+```
+
+| Property  | Description                                                                    | Required? |
+|-----------|--------------------------------------------------------------------------------|-----------|
+| type      | Must be "longSum", "doubleSum", or "floatSum".                                 | yes       |
+| name      | A String for the output (result) name of the aggregation.                      | yes       |
+| fieldName | A String for the name of the input field containing pre-aggregated histograms. | yes       |
+
+## Post Aggregators
+
+### Percentile (singular)
+This returns a single percentile calculation based on the distribution of the values in the aggregated histogram.
+
+```
+{
+  "type": "percentileSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentile": <decimal percentile, e.g. 50.0 for median>
+}
+```
+
+| Property   | Description                                                 | Required? |
+|------------|-------------------------------------------------------------|-----------|
+| type       | This String should always be "percentileSpectatorHistogram" | yes       |
+| name       | A String for the output (result) name of the calculation.   | yes       |
+| field      | A field reference pointing to the aggregated histogram.     | yes       |
+| percentile | A single decimal percentile between 0.0 and 100.0           | yes       |
+
+### Percentiles (multiple)
+This returns an array of percentiles corresponding to those requested.
+
+```
+{
+  "type": "percentilesSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentiles": [25, 50, 75, 99.5]
+}
+```
+
+> Note: It's more efficient to request multiple percentiles in a single query
+than to request individual percentiles in separate queries. This array-based
+helper is provided for convenience and has a marginal performance benefit over
+using the singular percentile post-aggregator multiple times within a query.
+The more expensive part of the query is the aggregation of the histogram.
+The post-aggregation calculations all happen on the same aggregated histogram.
+
+Results will contain arrays matching the length and order of the requested
+array of percentiles.
+
+```
+"percentilesAdded": [
+    0.5504911679884643, // 25th percentile
+    4.013975155279504,  // 50th percentile 
+    78.89518317503394,  // 75th percentile
+    8580.024999999994   // 99.5th percentile
+]
+```
+
+| Property    | Description                                                  | Required? |
+|-------------|--------------------------------------------------------------|-----------|
+| type        | This String should always be "percentilesSpectatorHistogram" | yes       |
+| name        | A String for the output (result) name of the calculation.    | yes       |
+| field       | A field reference pointing to the aggregated histogram.      | yes       |
+| percentiles | Non-empty array of decimal percentiles between 0.0 and 100.0 | yes       |
+
+## Appendix
+
+### Example Ingestion Spec
+Example of ingesting the sample wikipedia dataset with a histogram metric column:
+```json
+{
+  "type": "index_parallel",
+  "spec": {
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "http",
+        "uris": ["https://druid.apache.org/data/wikipedia.json.gz"]
+      },
+      "inputFormat": { "type": "json" }
+    },
+    "dataSchema": {
+      "granularitySpec": {
+        "segmentGranularity": "day",
+        "queryGranularity": "minute",
+        "rollup": true
+      },
+      "dataSource": "wikipedia",
+      "timestampSpec": { "column": "timestamp", "format": "iso" },
+      "dimensionsSpec": {
+        "dimensions": [
+          "isRobot",
+          "channel",
+          "flags",
+          "isUnpatrolled",
+          "page",
+          "diffUrl",
+          "comment",
+          "isNew",
+          "isMinor",
+          "isAnonymous",
+          "user",
+          "namespace",
+          "cityName",
+          "countryName",
+          "regionIsoCode",
+          "metroCode",
+          "countryIsoCode",
+          "regionName"
+        ]
+      },
+      "metricsSpec": [
+        { "name": "count", "type": "count" },
+        { "name": "sum_added", "type": "longSum", "fieldName": "added" },
+        {
+          "name": "hist_added",
+          "type": "spectatorHistogram",
+          "fieldName": "added"
+        }
+      ]
+    },
+    "tuningConfig": {
+      "type": "index_parallel",
+      "partitionsSpec": { "type": "hashed" },
+      "forceGuaranteedRollup": true
+    }
+  }
+}
+```
+
+### Example Query
+Example query using the sample wikipedia dataset:
+```json
+{
+  "queryType": "timeseries",
+  "dataSource": {
+    "type": "table",
+    "name": "wikipedia"
+  },
+  "intervals": {
+    "type": "intervals",
+    "intervals": [
+      "0000-01-01/9999-12-31"
+    ]
+  },
+  "granularity": {
+    "type": "all"
+  },
+  "aggregations": [
+    {
+      "type": "spectatorHistogram",
+      "name": "histogram_added",
+      "fieldName": "added"
+    }
+  ],
+  "postAggregations": [
+    {
+      "type": "percentileSpectatorHistogram",
+      "name": "medianAdded",
+      "field": {
+        "type": "fieldAccess",
+        "fieldName": "histogram_added"
+      },
+      "percentile": "50.0"
+    }
+  ]
+}
+```
+Results in
+```json
+[
+  {
+    "result": {
+      "histogram_added": {
+        "0": 11096, "1": 632, "2": 297, "3": 187, "4": 322, "5": 161,
+        "6": 174, "7": 127, "8": 125, "9": 162, "10": 123, "11": 106,
+        "12": 95, "13": 104, "14": 95, "15": 588, "16": 540, "17": 690,
+        "18": 719, "19": 478, "20": 288, "21": 250, "22": 219, "23": 224,
+        "24": 737, "25": 424, "26": 343, "27": 266, "28": 232, "29": 217,
+        "30": 171, "31": 164, "32": 161, "33": 530, "34": 339, "35": 236,
+        "36": 181, "37": 152, "38": 113, "39": 128, "40": 80, "41": 75,
+        "42": 289, "43": 145, "44": 138, "45": 83, "46": 45, "47": 46,
+        "48": 64, "49": 65, "50": 71, "51": 421, "52": 525, "53": 59,
+        "54": 31, "55": 35, "56": 8, "57": 10, "58": 5, "59": 4, "60": 11,
+        "61": 10, "62": 5, "63": 2, "64": 2, "65": 1, "67": 1, "68": 1,
+        "69": 1, "70": 1, "71": 1, "78": 2
+      },
+      "medianAdded": 4.013975155279504
+    },
+    "timestamp": "2016-06-27T00:00:00.000Z"
+  }
+]
+```
+
+### Histogram Bucket Boundaries
+These are the upper bounds of each bucket index. There are 276 buckets.
+The first bucket index is 0 and the last bucket index is 275.
+As you can see the bucket widths increase as the bucket index increases. This leads to a greater absolute error for larger values, but maintains a relative error of rough percentage across the number range.
+i.e the maximum error at value 10 is 0 since the bucket width is 1. But for a value of 16,000,000,000 the bucket width is 1,431,655,768 giving an error of up to ~8.9%. In practice, the observed error of computed percentiles is in the range (0.1%, 3%).

Review Comment:
   ```suggestion
   For example, the maximum error at value 10 is zero since the bucket width is 1 (the difference of `11-10`). For a value of 16,000,000,000, the bucket width is 1,431,655,768 (from `17179869184-15748213416`). This gives an error of up to ~8.9%, from `1,431,655,768/16,000,000,000*100`. In practice, the observed error of computed percentiles is in the range of (0.1%, 3%).
   ```
   Not sure if it's 11-10 or 10-9. Consider including basic calculations to guide the reader to understand the results.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "suneet-s (via GitHub)" <gi...@apache.org>.
suneet-s commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1440657269


##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,386 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations

Review Comment:
   Can you add a limitation here that it is not yet possible to use this via SQL.



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,386 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive numeric values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Fixed buckets with increasing bucket widths. Relative accuracy is maintained,
+but absolute accuracy reduces with larger values.
+
+> If either of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.
+
+```json
+{ "4":  8, "5": 15, "6": 37, "7": 9, "8": 3, "10": 1, "13": 1 }
+```
+
+## Loading the extension
+To use SpectatorHistogram, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
+
+```
+druid.extensions.loadList=["druid-spectator-histogram"]
+```
+
+## Aggregators
+
+The result of the aggregation is a histogram that is built by ingesting numeric values from
+the raw data, or from combining pre-aggregated histograms. The result is represented in 
+JSON format where the keys are the bucket index and the values are the count of entries
+in that bucket.
+
+The buckets are defined as per the Spectator [PercentileBuckets](https://github.com/Netflix/spectator/blob/main/spectator-api/src/main/java/com/netflix/spectator/api/histogram/PercentileBuckets.java) specification.
+See [Appendix](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+```js
+  // The set of buckets is generated by using powers of 4 and incrementing by one-third of the
+  // previous power of 4 in between as long as the value is less than the next power of 4 minus
+  // the delta.
+  //
+  // Base: 1, 2, 3
+  //
+  // 4 (4^1), delta = 1 (~1/3 of 4)
+  //     5, 6, 7, ..., 14,
+  //
+  // 16 (4^2), delta = 5 (~1/3 of 16)
+  //    21, 26, 31, ..., 56,
+  //
+  // 64 (4^3), delta = 21 (~1/3 of 64)
+  // ...
+```
+
+There are multiple aggregator types included, all of which are based on the same
+underlying implementation. The different types signal to the Atlas-Druid service (if using)
+how to handle the resulting data from a query.
+
+* spectatorHistogramTimer signals that the histogram is representing
+a collection of timer values. It is recommended to normalize timer values to nanoseconds
+at, or prior to, ingestion. If queried via the Atlas-Druid service, it will
+normalize timers to second resolution at query time as a more natural unit of time
+for human consumption.
+* spectatorHistogram and spectatorHistogramDistribution are generic histograms that
+can be used to represent any measured value without units. No normalization is
+required or performed.
+
+### `spectatorHistogram` aggregator
+Alias: `spectatorHistogramDistribution`, `spectatorHistogramTimer`
+
+To aggregate at query time:
+```
+{
+  "type" : "spectatorHistogram",
+  "name" : <output_name>,
+  "fieldName" : <column_name>
+ }
+```
+
+| Property  | Description                                                                                                  | Required? |
+|-----------|--------------------------------------------------------------------------------------------------------------|-----------|
+| type      | This String must be one of "spectatorHistogram", "spectatorHistogramTimer", "spectatorHistogramDistribution" | yes       |
+| name      | A String for the output (result) name of the aggregation.                                                    | yes       |
+| fieldName | A String for the name of the input field containing raw numeric values or pre-aggregated histograms.         | yes       |
+
+### `longSum`, `doubleSum` and `floatSum` aggregators
+To get the population size (count of events contributing to the histogram):
+```
+{
+  "type" : "longSum",
+  "name" : <output_name>,
+  "fieldName" : <column_name_of_aggregated_histogram>
+ }
+```
+
+| Property  | Description                                                                    | Required? |
+|-----------|--------------------------------------------------------------------------------|-----------|
+| type      | Must be "longSum", "doubleSum", or "floatSum".                                 | yes       |
+| name      | A String for the output (result) name of the aggregation.                      | yes       |
+| fieldName | A String for the name of the input field containing pre-aggregated histograms. | yes       |
+
+## Post Aggregators
+
+### Percentile (singular)
+This returns a single percentile calculation based on the distribution of the values in the aggregated histogram.
+
+```
+{
+  "type": "percentileSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentile": <decimal percentile, e.g. 50.0 for median>
+}
+```
+
+| Property   | Description                                                 | Required? |
+|------------|-------------------------------------------------------------|-----------|
+| type       | This String should always be "percentileSpectatorHistogram" | yes       |
+| name       | A String for the output (result) name of the calculation.   | yes       |
+| field      | A field reference pointing to the aggregated histogram.     | yes       |
+| percentile | A single decimal percentile between 0.0 and 100.0           | yes       |
+
+### Percentiles (multiple)
+This returns an array of percentiles corresponding to those requested.
+
+```
+{
+  "type": "percentilesSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentiles": [25, 50, 75, 99.5]
+}
+```
+
+> Note: It's more efficient to request multiple percentiles in a single query
+than to request individual percentiles in separate queries. This array-based
+helper is provided for convenience and has a marginal performance benefit over
+using the singular percentile post-aggregator multiple times within a query.
+The more expensive part of the query is the aggregation of the histogram.
+The post-aggregation calculations all happen on the same aggregated histogram.
+
+Results will contain arrays matching the length and order of the requested
+array of percentiles.
+
+```
+"percentilesAdded": [
+    0.5504911679884643, // 25th percentile
+    4.013975155279504,  // 50th percentile 
+    78.89518317503394,  // 75th percentile
+    8580.024999999994   // 99.5th percentile
+]
+```
+
+| Property    | Description                                                  | Required? |
+|-------------|--------------------------------------------------------------|-----------|
+| type        | This String should always be "percentilesSpectatorHistogram" | yes       |
+| name        | A String for the output (result) name of the calculation.    | yes       |
+| field       | A field reference pointing to the aggregated histogram.      | yes       |
+| percentiles | Non-empty array of decimal percentiles between 0.0 and 100.0 | yes       |
+
+## Appendix
+

Review Comment:
   An example spec of how to ingest the wikipedia dataset with this the spectator histogram would be helpful here.



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,386 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive numeric values within the range of [0, 2^53). Negatives are

Review Comment:
   I think it would be good to call out that decimals are not supported - when I first read numeric values, I just assumed that decimals were supported, but the druid summit talk mentions those are not supported.



##########
docs/configuration/extensions.md:
##########
@@ -76,30 +76,31 @@ If you'd like to take on maintenance for a community extension, please post on [
 
 All of these community extensions can be downloaded using [pull-deps](../operations/pull-deps.md) while specifying a `-c` coordinate option to pull `org.apache.druid.extensions.contrib:{EXTENSION_NAME}:{DRUID_VERSION}`.
 
-|Name|Description|Docs|
-|----|-----------|----|
-|aliyun-oss-extensions|Aliyun OSS deep storage |[link](../development/extensions-contrib/aliyun-oss-extensions.md)|
-|ambari-metrics-emitter|Ambari Metrics Emitter |[link](../development/extensions-contrib/ambari-metrics-emitter.md)|
-|druid-cassandra-storage|Apache Cassandra deep storage.|[link](../development/extensions-contrib/cassandra.md)|
-|druid-cloudfiles-extensions|Rackspace Cloudfiles deep storage and firehose.|[link](../development/extensions-contrib/cloudfiles.md)|
-|druid-compressed-bigdecimal|Compressed Big Decimal Type | [link](../development/extensions-contrib/compressed-big-decimal.md)|
-|druid-distinctcount|DistinctCount aggregator|[link](../development/extensions-contrib/distinctcount.md)|
-|druid-redis-cache|A cache implementation for Druid based on Redis.|[link](../development/extensions-contrib/redis-cache.md)|
-|druid-time-min-max|Min/Max aggregator for timestamp.|[link](../development/extensions-contrib/time-min-max.md)|
-|sqlserver-metadata-storage|Microsoft SQLServer deep storage.|[link](../development/extensions-contrib/sqlserver.md)|
-|graphite-emitter|Graphite metrics emitter|[link](../development/extensions-contrib/graphite.md)|
-|statsd-emitter|StatsD metrics emitter|[link](../development/extensions-contrib/statsd.md)|
-|kafka-emitter|Kafka metrics emitter|[link](../development/extensions-contrib/kafka-emitter.md)|
-|druid-thrift-extensions|Support thrift ingestion |[link](../development/extensions-contrib/thrift.md)|
-|druid-opentsdb-emitter|OpenTSDB metrics emitter |[link](../development/extensions-contrib/opentsdb-emitter.md)|
-|materialized-view-selection, materialized-view-maintenance|Materialized View|[link](../development/extensions-contrib/materialized-view.md)|
-|druid-moving-average-query|Support for [Moving Average](https://en.wikipedia.org/wiki/Moving_average) and other Aggregate [Window Functions](https://en.wikibooks.org/wiki/Structured_Query_Language/Window_functions) in Druid queries.|[link](../development/extensions-contrib/moving-average-query.md)|
-|druid-influxdb-emitter|InfluxDB metrics emitter|[link](../development/extensions-contrib/influxdb-emitter.md)|
-|druid-momentsketch|Support for approximate quantile queries using the [momentsketch](https://github.com/stanford-futuredata/momentsketch) library|[link](../development/extensions-contrib/momentsketch-quantiles.md)|
-|druid-tdigestsketch|Support for approximate sketch aggregators based on [T-Digest](https://github.com/tdunning/t-digest)|[link](../development/extensions-contrib/tdigestsketch-quantiles.md)|
-|gce-extensions|GCE Extensions|[link](../development/extensions-contrib/gce-extensions.md)|
-|prometheus-emitter|Exposes [Druid metrics](../operations/metrics.md) for Prometheus server collection (https://prometheus.io/)|[link](../development/extensions-contrib/prometheus.md)|
-|kubernetes-overlord-extensions|Support for launching tasks in k8s without Middle Managers|[link](../development/extensions-contrib/k8s-jobs.md)|
+| Name                                                       | Description                                                                                                                                                                                                   | Docs                                                                 |
+|------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|
+| aliyun-oss-extensions                                      | Aliyun OSS deep storage                                                                                                                                                                                       | [link](../development/extensions-contrib/aliyun-oss-extensions.md)   |
+| ambari-metrics-emitter                                     | Ambari Metrics Emitter                                                                                                                                                                                        | [link](../development/extensions-contrib/ambari-metrics-emitter.md)  |
+| druid-cassandra-storage                                    | Apache Cassandra deep storage.                                                                                                                                                                                | [link](../development/extensions-contrib/cassandra.md)               |
+| druid-cloudfiles-extensions                                | Rackspace Cloudfiles deep storage and firehose.                                                                                                                                                               | [link](../development/extensions-contrib/cloudfiles.md)              |
+| druid-compressed-bigdecimal                                | Compressed Big Decimal Type                                                                                                                                                                                   | [link](../development/extensions-contrib/compressed-big-decimal.md)  |
+| druid-distinctcount                                        | DistinctCount aggregator                                                                                                                                                                                      | [link](../development/extensions-contrib/distinctcount.md)           |
+| druid-redis-cache                                          | A cache implementation for Druid based on Redis.                                                                                                                                                              | [link](../development/extensions-contrib/redis-cache.md)             |
+| druid-time-min-max                                         | Min/Max aggregator for timestamp.                                                                                                                                                                             | [link](../development/extensions-contrib/time-min-max.md)            |
+| sqlserver-metadata-storage                                 | Microsoft SQLServer deep storage.                                                                                                                                                                             | [link](../development/extensions-contrib/sqlserver.md)               |
+| graphite-emitter                                           | Graphite metrics emitter                                                                                                                                                                                      | [link](../development/extensions-contrib/graphite.md)                |
+| statsd-emitter                                             | StatsD metrics emitter                                                                                                                                                                                        | [link](../development/extensions-contrib/statsd.md)                  |
+| kafka-emitter                                              | Kafka metrics emitter                                                                                                                                                                                         | [link](../development/extensions-contrib/kafka-emitter.md)           |
+| druid-thrift-extensions                                    | Support thrift ingestion                                                                                                                                                                                      | [link](../development/extensions-contrib/thrift.md)                  |
+| druid-opentsdb-emitter                                     | OpenTSDB metrics emitter                                                                                                                                                                                      | [link](../development/extensions-contrib/opentsdb-emitter.md)        |
+| materialized-view-selection, materialized-view-maintenance | Materialized View                                                                                                                                                                                             | [link](../development/extensions-contrib/materialized-view.md)       |
+| druid-moving-average-query                                 | Support for [Moving Average](https://en.wikipedia.org/wiki/Moving_average) and other Aggregate [Window Functions](https://en.wikibooks.org/wiki/Structured_Query_Language/Window_functions) in Druid queries. | [link](../development/extensions-contrib/moving-average-query.md)    |
+| druid-influxdb-emitter                                     | InfluxDB metrics emitter                                                                                                                                                                                      | [link](../development/extensions-contrib/influxdb-emitter.md)        |
+| druid-momentsketch                                         | Support for approximate quantile queries using the [momentsketch](https://github.com/stanford-futuredata/momentsketch) library                                                                                | [link](../development/extensions-contrib/momentsketch-quantiles.md)  |
+| druid-tdigestsketch                                        | Support for approximate sketch aggregators based on [T-Digest](https://github.com/tdunning/t-digest)                                                                                                          | [link](../development/extensions-contrib/tdigestsketch-quantiles.md) |
+| gce-extensions                                             | GCE Extensions                                                                                                                                                                                                | [link](../development/extensions-contrib/gce-extensions.md)          |
+| prometheus-emitter                                         | Exposes [Druid metrics](../operations/metrics.md) for Prometheus server collection (https://prometheus.io/)                                                                                                   | [link](../development/extensions-contrib/prometheus.md)              |
+| kubernetes-overlord-extensions                             | Support for launching tasks in k8s without Middle Managers                                                                                                                                                    | [link](../development/extensions-contrib/k8s-jobs.md)                |
+| druid-spectator-histogram                                  | Support for efficient approximate percentile queries                                                                                                                                                          | [link](../development/extensions-contrib/spectator-histogram.md)     |

Review Comment:
   Can you update your editor to undo the formatting changes to this table please.
   
   https://github.com/apache/druid/blob/master/dev/druid_intellij_formatting.xml#L77-L80 - This was recently added to the druid_intellij_formatting.xml file- so if you re-import it, the formatter should no longer update the tables when you edit them.



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,386 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive numeric values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Fixed buckets with increasing bucket widths. Relative accuracy is maintained,
+but absolute accuracy reduces with larger values.

Review Comment:
   Can you explain the accuracy tradeoff here vs other sketch implementations. 
   
   I don't understand what absolute accuracy reduces with larger values means. Maybe an example in the docs will help clear it up.
   
   I think that sort of information will be helpful for users to decide which sketch implementation to use for their use case.



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,386 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive numeric values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Fixed buckets with increasing bucket widths. Relative accuracy is maintained,
+but absolute accuracy reduces with larger values.
+
+> If either of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.
+
+```json
+{ "4":  8, "5": 15, "6": 37, "7": 9, "8": 3, "10": 1, "13": 1 }
+```
+
+## Loading the extension
+To use SpectatorHistogram, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
+
+```
+druid.extensions.loadList=["druid-spectator-histogram"]
+```
+
+## Aggregators
+
+The result of the aggregation is a histogram that is built by ingesting numeric values from
+the raw data, or from combining pre-aggregated histograms. The result is represented in 
+JSON format where the keys are the bucket index and the values are the count of entries
+in that bucket.
+
+The buckets are defined as per the Spectator [PercentileBuckets](https://github.com/Netflix/spectator/blob/main/spectator-api/src/main/java/com/netflix/spectator/api/histogram/PercentileBuckets.java) specification.
+See [Appendix](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+```js
+  // The set of buckets is generated by using powers of 4 and incrementing by one-third of the
+  // previous power of 4 in between as long as the value is less than the next power of 4 minus
+  // the delta.
+  //
+  // Base: 1, 2, 3
+  //
+  // 4 (4^1), delta = 1 (~1/3 of 4)
+  //     5, 6, 7, ..., 14,
+  //
+  // 16 (4^2), delta = 5 (~1/3 of 16)
+  //    21, 26, 31, ..., 56,
+  //
+  // 64 (4^3), delta = 21 (~1/3 of 64)
+  // ...
+```
+
+There are multiple aggregator types included, all of which are based on the same
+underlying implementation. The different types signal to the Atlas-Druid service (if using)
+how to handle the resulting data from a query.
+
+* spectatorHistogramTimer signals that the histogram is representing
+a collection of timer values. It is recommended to normalize timer values to nanoseconds
+at, or prior to, ingestion. If queried via the Atlas-Druid service, it will
+normalize timers to second resolution at query time as a more natural unit of time
+for human consumption.
+* spectatorHistogram and spectatorHistogramDistribution are generic histograms that
+can be used to represent any measured value without units. No normalization is
+required or performed.
+
+### `spectatorHistogram` aggregator
+Alias: `spectatorHistogramDistribution`, `spectatorHistogramTimer`
+
+To aggregate at query time:
+```
+{
+  "type" : "spectatorHistogram",
+  "name" : <output_name>,
+  "fieldName" : <column_name>
+ }
+```
+
+| Property  | Description                                                                                                  | Required? |
+|-----------|--------------------------------------------------------------------------------------------------------------|-----------|
+| type      | This String must be one of "spectatorHistogram", "spectatorHistogramTimer", "spectatorHistogramDistribution" | yes       |
+| name      | A String for the output (result) name of the aggregation.                                                    | yes       |
+| fieldName | A String for the name of the input field containing raw numeric values or pre-aggregated histograms.         | yes       |
+
+### `longSum`, `doubleSum` and `floatSum` aggregators
+To get the population size (count of events contributing to the histogram):
+```
+{
+  "type" : "longSum",
+  "name" : <output_name>,
+  "fieldName" : <column_name_of_aggregated_histogram>
+ }
+```
+
+| Property  | Description                                                                    | Required? |
+|-----------|--------------------------------------------------------------------------------|-----------|
+| type      | Must be "longSum", "doubleSum", or "floatSum".                                 | yes       |
+| name      | A String for the output (result) name of the aggregation.                      | yes       |
+| fieldName | A String for the name of the input field containing pre-aggregated histograms. | yes       |
+
+## Post Aggregators
+
+### Percentile (singular)
+This returns a single percentile calculation based on the distribution of the values in the aggregated histogram.
+
+```
+{
+  "type": "percentileSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentile": <decimal percentile, e.g. 50.0 for median>
+}
+```
+
+| Property   | Description                                                 | Required? |
+|------------|-------------------------------------------------------------|-----------|
+| type       | This String should always be "percentileSpectatorHistogram" | yes       |
+| name       | A String for the output (result) name of the calculation.   | yes       |
+| field      | A field reference pointing to the aggregated histogram.     | yes       |
+| percentile | A single decimal percentile between 0.0 and 100.0           | yes       |
+
+### Percentiles (multiple)
+This returns an array of percentiles corresponding to those requested.
+
+```
+{
+  "type": "percentilesSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentiles": [25, 50, 75, 99.5]
+}
+```
+
+> Note: It's more efficient to request multiple percentiles in a single query

Review Comment:
   nit: Given this note, would it be a nicer UX if the extension did not provide a way to get a single percentile. If users want to get a single percentile, they could pass in an array with one element.
   
   I don't have a strong opinion on this, so if you think having both functions is better - that's fine with me too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1446427421


##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider SpectatorHistogram to compute percentile approximations. This extension has a reduced storage footprint compared to the [DataSketches extension](../extensions-core/datasketches-extension.md), which results in smaller segment sizes, faster loading from deep storage, and lower memory usage. This extension provides fast and accurate queries on large datasets at low storage cost.
+
+This aggregator only applies when your raw data contains positive long integer values. Do not use this aggregator if you have negative values in your data.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* `wikipedia` contains the dataset ingested as is, without rollup
+* `wikipedia_spectator` contains the dataset with a single extra metric column of type `spectatorHistogram` for the `added` column
+* `wikipedia_datasketch` contains the dataset with a single extra metric column of type `quantilesDoublesSketch` for the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the `quantilesDoublesSketch`
+adds 48 bytes per row. This represents an eightfold reduction in additional storage size for spectator histograms.
+
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size savings. For example, when you ingest the Wikipedia dataset
+with day-grain query granularity and remove all dimensions except `countryName`,
+this results in a segment that has just 106 rows. The base segment has 87 bytes per row.
+Compare the following bytes per row for SpectatorHistogram versus DataSketches:
+* An additional `spectatorHistogram` column adds 27 bytes per row on average.
+* An additional `quantilesDoublesSketch` column adds 255 bytes per row.
+
+SpectatorHistogram reduces the additional storage size by 9.4 times in this example.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in DataSketches `quantilesDoublesSketch` aggregator, but is
+opinionated and optimized for typical measurements from cloud services and web apps.
+For example, measurements such as page load time, transferred bytes, response time, and request latency.

Review Comment:
   ```suggestion
   It provides similar functionality to the built-in DataSketches `quantilesDoublesSketch` aggregator, but is
   opinionated to maintain higher accuracy at smaller values.
   See [Bucket boundaries](#histogram-bucket-boundaries) for more information and an example.
   The SpectatorHistogram is optimized for typical measurements from cloud services and web apps,
   such as measurements such as page load time, transferred bytes, response time, and request latency.
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "adarshsanjeev (via GitHub)" <gi...@apache.org>.
adarshsanjeev commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1442810055


##########
extensions-contrib/spectator-histogram/src/main/java/org/apache/druid/spectator/histogram/SpectatorHistogramModule.java:
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.fasterxml.jackson.databind.Module;
+import com.fasterxml.jackson.databind.jsontype.NamedType;
+import com.fasterxml.jackson.databind.module.SimpleModule;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.ImmutableList;
+import com.google.inject.Binder;
+import org.apache.druid.initialization.DruidModule;
+import org.apache.druid.segment.serde.ComplexMetrics;
+
+import java.util.List;
+
+/**
+ * Module defining various aggregators for Spectator Histograms
+ */
+public class SpectatorHistogramModule implements DruidModule
+{
+  @VisibleForTesting
+  public static void registerSerde()
+  {
+    ComplexMetrics.registerSerde(
+        SpectatorHistogramAggregatorFactory.TYPE_NAME,
+        new SpectatorHistogramComplexMetricSerde(SpectatorHistogramAggregatorFactory.TYPE_NAME)
+    );
+    ComplexMetrics.registerSerde(
+        SpectatorHistogramAggregatorFactory.Timer.TYPE_NAME,
+        new SpectatorHistogramComplexMetricSerde(SpectatorHistogramAggregatorFactory.Timer.TYPE_NAME)
+    );
+    ComplexMetrics.registerSerde(
+        SpectatorHistogramAggregatorFactory.Distribution.TYPE_NAME,
+        new SpectatorHistogramComplexMetricSerde(SpectatorHistogramAggregatorFactory.Distribution.TYPE_NAME)
+    );
+  }
+
+  @Override
+  public List<? extends Module> getJacksonModules()
+  {
+    return ImmutableList.of(
+        new SimpleModule(
+            getClass().getSimpleName()
+        ).registerSubtypes(
+            new NamedType(
+                SpectatorHistogramAggregatorFactory.class,
+                SpectatorHistogramAggregatorFactory.TYPE_NAME
+            ),
+            new NamedType(
+                SpectatorHistogramAggregatorFactory.Timer.class,
+                SpectatorHistogramAggregatorFactory.Timer.TYPE_NAME
+            ),
+            new NamedType(
+                SpectatorHistogramAggregatorFactory.Distribution.class,
+                SpectatorHistogramAggregatorFactory.Distribution.TYPE_NAME
+            ),
+            new NamedType(
+                SpectatorHistogramPercentilePostAggregator.class,
+                SpectatorHistogramPercentilePostAggregator.TYPE_NAME
+            ),
+            new NamedType(
+                SpectatorHistogramPercentilesPostAggregator.class,
+                SpectatorHistogramPercentilesPostAggregator.TYPE_NAME
+            )
+        ).addSerializer(SpectatorHistogram.class, new SpectatorHistogramJsonSerializer())
+    );
+  }
+
+  @Override
+  public void configure(Binder binder)
+  {
+    registerSerde();
+    //TODO: samarth this probably needs to be added for sql

Review Comment:
   This could either be removed to a comment till sql support is added



##########
extensions-contrib/spectator-histogram/src/main/java/org/apache/druid/spectator/histogram/NullableOffsetsHeader.java:
##########
@@ -0,0 +1,378 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.google.common.base.Preconditions;
+import org.apache.druid.io.Channels;
+import org.apache.druid.java.util.common.io.smoosh.FileSmoosher;
+import org.apache.druid.segment.serde.Serializer;
+import org.apache.druid.segment.writeout.SegmentWriteOutMedium;
+import org.apache.druid.segment.writeout.WriteOutBytes;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.channels.WritableByteChannel;
+import java.util.BitSet;
+import java.util.Objects;
+
+public class NullableOffsetsHeader implements Serializer

Review Comment:
   nit: A javadoc for this class as a small summary of its usages would help make the code more readable



##########
extensions-contrib/spectator-histogram/src/main/java/org/apache/druid/spectator/histogram/SpectatorHistogram.java:
##########
@@ -0,0 +1,423 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.SerializerProvider;
+import com.netflix.spectator.api.histogram.PercentileBuckets;
+import it.unimi.dsi.fastutil.shorts.Short2LongMap;
+import it.unimi.dsi.fastutil.shorts.Short2LongMaps;
+import it.unimi.dsi.fastutil.shorts.Short2LongOpenHashMap;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.java.util.common.jackson.JacksonUtils;
+import org.apache.druid.java.util.common.parsers.ParseException;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Objects;
+
+// Since queries don't come from SpectatorHistogramAggregator in the case of
+// using longSum or doubleSum aggregations. They come from LongSumBufferAggregator.
+// Therefore, we extended Number here.
+// This will prevent class casting exceptions if trying to query with sum rather
+// than explicitly as a SpectatorHistogram
+//
+// The SpectatorHistorgram is a Number. That number is of intValue(),

Review Comment:
   SpectatorHistogram



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "adarshsanjeev (via GitHub)" <gi...@apache.org>.
adarshsanjeev commented on PR #15340:
URL: https://github.com/apache/druid/pull/15340#issuecomment-1820228088

   Thanks for the PR! This looks like a really cool addition to Druid. I'm going through the PR, and will add comments after it's done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "bsyk (via GitHub)" <gi...@apache.org>.
bsyk commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1443335975


##########
extensions-contrib/spectator-histogram/src/main/java/org/apache/druid/spectator/histogram/SpectatorHistogramAggregatorFactory.java:
##########
@@ -0,0 +1,373 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.druid.query.aggregation.AggregateCombiner;
+import org.apache.druid.query.aggregation.Aggregator;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.AggregatorFactoryNotMergeableException;
+import org.apache.druid.query.aggregation.AggregatorUtil;
+import org.apache.druid.query.aggregation.BufferAggregator;
+import org.apache.druid.query.aggregation.ObjectAggregateCombiner;
+import org.apache.druid.query.cache.CacheKeyBuilder;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.column.ValueType;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+
+@JsonTypeName(SpectatorHistogramAggregatorFactory.TYPE_NAME)
+public class SpectatorHistogramAggregatorFactory extends AggregatorFactory
+{
+
+  @Nonnull
+  private final String name;
+  @Nonnull
+  private final String fieldName;
+
+  @Nonnull
+  private final byte cacheTypeId;
+
+  public static final String TYPE_NAME = "spectatorHistogram";
+
+  @JsonCreator
+  public SpectatorHistogramAggregatorFactory(
+      @JsonProperty("name") final String name,
+      @JsonProperty("fieldName") final String fieldName
+  )
+  {
+    this(name, fieldName, AggregatorUtil.SPECTATOR_HISTOGRAM_CACHE_TYPE_ID);
+  }
+
+  public SpectatorHistogramAggregatorFactory(
+      final String name,
+      final String fieldName,
+      final byte cacheTypeId
+  )
+  {
+    this.name = Objects.requireNonNull(name, "Must have a valid, non-null aggregator name");
+    this.fieldName = Objects.requireNonNull(fieldName, "Parameter fieldName must be specified");
+    this.cacheTypeId = cacheTypeId;
+  }
+
+
+  @Override
+  public byte[] getCacheKey()
+  {
+    return new CacheKeyBuilder(
+        cacheTypeId
+    ).appendString(fieldName).build();
+  }
+
+
+  @Override
+  public Aggregator factorize(ColumnSelectorFactory metricFactory)
+  {
+    return new SpectatorHistogramAggregator(metricFactory.makeColumnValueSelector(fieldName));
+  }
+
+  @Override
+  public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
+  {
+    return new SpectatorHistogramBufferAggregator(metricFactory.makeColumnValueSelector(fieldName));
+  }
+
+  // This is used when writing metrics to segment files to check whether the column is sorted.
+  // Since there is no sensible way really to compare histograms, compareTo always returns 1.
+  public static final Comparator<SpectatorHistogram> COMPARATOR = (o, o1) -> {
+    if (o == null && o1 == null) {
+      return 0;
+    } else if (o != null && o1 == null) {
+      return -1;
+    } else if (o == null) {
+      return 1;
+    }
+    return Integer.compare(o.hashCode(), o1.hashCode());
+  };
+
+  @Override
+  public Comparator getComparator()
+  {
+    return COMPARATOR;
+  }
+
+  @Override
+  public Object combine(@Nullable Object lhs, @Nullable Object rhs)
+  {
+    if (lhs == null) {
+      return rhs;
+    }
+    if (rhs == null) {
+      return lhs;
+    }
+    SpectatorHistogram lhsHisto = (SpectatorHistogram) lhs;
+    SpectatorHistogram rhsHisto = (SpectatorHistogram) rhs;
+    lhsHisto.merge(rhsHisto);
+    return lhsHisto;
+  }
+
+  @Override
+  public AggregatorFactory getCombiningFactory()
+  {
+    return new SpectatorHistogramAggregatorFactory(name, name);
+  }
+
+  @Override
+  public AggregatorFactory getMergingFactory(AggregatorFactory other) throws AggregatorFactoryNotMergeableException
+  {
+    if (other.getName().equals(this.getName()) && this.getClass() == other.getClass()) {
+      return getCombiningFactory();
+    } else {
+      throw new AggregatorFactoryNotMergeableException(this, other);
+    }
+  }
+
+  @Override
+  public List<AggregatorFactory> getRequiredColumns()
+  {
+    return Collections.singletonList(
+        new SpectatorHistogramAggregatorFactory(
+            fieldName,
+            fieldName
+        )
+    );
+  }
+
+  @Override
+  public Object deserialize(Object serializedHistogram)
+  {
+    return SpectatorHistogram.deserialize(serializedHistogram);
+  }
+
+  @Nullable
+  @Override
+  public Object finalizeComputation(@Nullable Object object)
+  {
+    return object;
+  }
+
+  @Override
+  @JsonProperty
+  public String getName()
+  {
+    return name;
+  }
+
+  @JsonProperty
+  public String getFieldName()
+  {
+    return fieldName;
+  }
+
+  @Override
+  public List<String> requiredFields()
+  {
+    return Collections.singletonList(fieldName);
+  }
+
+  @Override
+  public String getComplexTypeName()
+  {
+    return TYPE_NAME;
+  }
+
+  @Override
+  public ValueType getType()
+  {
+    return ValueType.COMPLEX;
+  }
+
+  @Override
+  public ValueType getFinalizedType()
+  {
+    return ValueType.COMPLEX;
+  }
+
+  @Override
+  public int getMaxIntermediateSize()
+  {
+    return SpectatorHistogram.getMaxIntermdiateHistogramSize();
+  }
+
+  @Override
+  public AggregateCombiner makeAggregateCombiner()
+  {
+    return new ObjectAggregateCombiner<SpectatorHistogram>()
+    {
+      private SpectatorHistogram combined = null;
+
+      @Override
+      public void reset(final ColumnValueSelector selector)
+      {
+        combined = null;
+        fold(selector);
+      }
+
+      @Override
+      public void fold(final ColumnValueSelector selector)
+      {
+        SpectatorHistogram other = (SpectatorHistogram) selector.getObject();
+        if (other == null) {
+          return;
+        }
+        if (combined == null) {
+          combined = new SpectatorHistogram();
+        }
+        combined.merge(other);
+      }
+
+      @Nullable
+      @Override
+      public SpectatorHistogram getObject()
+      {
+        return combined;
+      }
+
+      @Override
+      public Class<SpectatorHistogram> classOfObject()
+      {
+        return SpectatorHistogram.class;
+      }
+    };
+  }
+
+  @Override
+  public boolean equals(final Object o)
+  {
+    if (this == o) {
+      return true;
+    }
+    if (o == null || !getClass().equals(o.getClass())) {
+      return false;
+    }
+    final SpectatorHistogramAggregatorFactory that = (SpectatorHistogramAggregatorFactory) o;
+
+    //TODO: samarth should we check for equality of contents in count arrays?

Review Comment:
   This was left over from an earlier implementation and no longer relevant.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1445441959


##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.
+
+```json
+{ "4":  8, "5": 15, "6": 37, "7": 9, "8": 3, "10": 1, "13": 1 }
+```
+
+## Loading the extension
+To use SpectatorHistogram, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
+
+```
+druid.extensions.loadList=["druid-spectator-histogram"]
+```
+
+## Aggregators
+
+The result of the aggregation is a histogram that is built by ingesting numeric values from
+the raw data, or from combining pre-aggregated histograms. The result is represented in 
+JSON format where the keys are the bucket index and the values are the count of entries
+in that bucket.
+
+The buckets are defined as per the Spectator [PercentileBuckets](https://github.com/Netflix/spectator/blob/main/spectator-api/src/main/java/com/netflix/spectator/api/histogram/PercentileBuckets.java) specification.
+See [Appendix](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+```js
+  // The set of buckets is generated by using powers of 4 and incrementing by one-third of the
+  // previous power of 4 in between as long as the value is less than the next power of 4 minus
+  // the delta.
+  //
+  // Base: 1, 2, 3
+  //
+  // 4 (4^1), delta = 1 (~1/3 of 4)
+  //     5, 6, 7, ..., 14,
+  //
+  // 16 (4^2), delta = 5 (~1/3 of 16)
+  //    21, 26, 31, ..., 56,
+  //
+  // 64 (4^3), delta = 21 (~1/3 of 64)
+  // ...
+```
+
+There are multiple aggregator types included, all of which are based on the same
+underlying implementation. The different types signal to the Atlas-Druid service (if using)
+how to handle the resulting data from a query.
+
+* spectatorHistogramTimer signals that the histogram is representing
+a collection of timer values. It is recommended to normalize timer values to nanoseconds
+at, or prior to, ingestion. If queried via the Atlas-Druid service, it will
+normalize timers to second resolution at query time as a more natural unit of time
+for human consumption.
+* spectatorHistogram and spectatorHistogramDistribution are generic histograms that
+can be used to represent any measured value without units. No normalization is
+required or performed.
+
+### `spectatorHistogram` aggregator
+Alias: `spectatorHistogramDistribution`, `spectatorHistogramTimer`
+
+To aggregate at query time:
+```
+{
+  "type" : "spectatorHistogram",
+  "name" : <output_name>,
+  "fieldName" : <column_name>
+ }
+```
+
+| Property  | Description                                                                                                  | Required? |
+|-----------|--------------------------------------------------------------------------------------------------------------|-----------|
+| type      | This String must be one of "spectatorHistogram", "spectatorHistogramTimer", "spectatorHistogramDistribution" | yes       |
+| name      | A String for the output (result) name of the aggregation.                                                    | yes       |
+| fieldName | A String for the name of the input field containing raw numeric values or pre-aggregated histograms.         | yes       |
+
+### `longSum`, `doubleSum` and `floatSum` aggregators
+To get the population size (count of events contributing to the histogram):
+```
+{
+  "type" : "longSum",
+  "name" : <output_name>,
+  "fieldName" : <column_name_of_aggregated_histogram>
+ }
+```
+
+| Property  | Description                                                                    | Required? |
+|-----------|--------------------------------------------------------------------------------|-----------|
+| type      | Must be "longSum", "doubleSum", or "floatSum".                                 | yes       |
+| name      | A String for the output (result) name of the aggregation.                      | yes       |
+| fieldName | A String for the name of the input field containing pre-aggregated histograms. | yes       |
+
+## Post Aggregators
+
+### Percentile (singular)
+This returns a single percentile calculation based on the distribution of the values in the aggregated histogram.
+
+```
+{
+  "type": "percentileSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentile": <decimal percentile, e.g. 50.0 for median>
+}
+```
+
+| Property   | Description                                                 | Required? |
+|------------|-------------------------------------------------------------|-----------|
+| type       | This String should always be "percentileSpectatorHistogram" | yes       |
+| name       | A String for the output (result) name of the calculation.   | yes       |
+| field      | A field reference pointing to the aggregated histogram.     | yes       |
+| percentile | A single decimal percentile between 0.0 and 100.0           | yes       |
+
+### Percentiles (multiple)
+This returns an array of percentiles corresponding to those requested.
+
+```
+{
+  "type": "percentilesSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentiles": [25, 50, 75, 99.5]
+}
+```
+
+> Note: It's more efficient to request multiple percentiles in a single query
+than to request individual percentiles in separate queries. This array-based
+helper is provided for convenience and has a marginal performance benefit over
+using the singular percentile post-aggregator multiple times within a query.
+The more expensive part of the query is the aggregation of the histogram.
+The post-aggregation calculations all happen on the same aggregated histogram.
+
+Results will contain arrays matching the length and order of the requested
+array of percentiles.
+
+```
+"percentilesAdded": [
+    0.5504911679884643, // 25th percentile
+    4.013975155279504,  // 50th percentile 
+    78.89518317503394,  // 75th percentile
+    8580.024999999994   // 99.5th percentile
+]
+```
+
+| Property    | Description                                                  | Required? |
+|-------------|--------------------------------------------------------------|-----------|
+| type        | This String should always be "percentilesSpectatorHistogram" | yes       |
+| name        | A String for the output (result) name of the calculation.    | yes       |
+| field       | A field reference pointing to the aggregated histogram.      | yes       |
+| percentiles | Non-empty array of decimal percentiles between 0.0 and 100.0 | yes       |
+
+## Appendix
+
+### Example Ingestion Spec
+Example of ingesting the sample wikipedia dataset with a histogram metric column:
+```json
+{
+  "type": "index_parallel",
+  "spec": {
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "http",
+        "uris": ["https://druid.apache.org/data/wikipedia.json.gz"]
+      },
+      "inputFormat": { "type": "json" }
+    },
+    "dataSchema": {
+      "granularitySpec": {
+        "segmentGranularity": "day",
+        "queryGranularity": "minute",
+        "rollup": true
+      },
+      "dataSource": "wikipedia",
+      "timestampSpec": { "column": "timestamp", "format": "iso" },
+      "dimensionsSpec": {
+        "dimensions": [
+          "isRobot",
+          "channel",
+          "flags",
+          "isUnpatrolled",
+          "page",
+          "diffUrl",
+          "comment",
+          "isNew",
+          "isMinor",
+          "isAnonymous",
+          "user",
+          "namespace",
+          "cityName",
+          "countryName",
+          "regionIsoCode",
+          "metroCode",
+          "countryIsoCode",
+          "regionName"
+        ]
+      },
+      "metricsSpec": [
+        { "name": "count", "type": "count" },
+        { "name": "sum_added", "type": "longSum", "fieldName": "added" },
+        {
+          "name": "hist_added",
+          "type": "spectatorHistogram",
+          "fieldName": "added"
+        }
+      ]
+    },
+    "tuningConfig": {
+      "type": "index_parallel",
+      "partitionsSpec": { "type": "hashed" },
+      "forceGuaranteedRollup": true
+    }
+  }
+}
+```
+
+### Example Query
+Example query using the sample wikipedia dataset:
+```json
+{
+  "queryType": "timeseries",
+  "dataSource": {
+    "type": "table",
+    "name": "wikipedia"
+  },
+  "intervals": {
+    "type": "intervals",
+    "intervals": [
+      "0000-01-01/9999-12-31"
+    ]
+  },
+  "granularity": {
+    "type": "all"
+  },
+  "aggregations": [
+    {
+      "type": "spectatorHistogram",
+      "name": "histogram_added",
+      "fieldName": "added"
+    }
+  ],
+  "postAggregations": [
+    {
+      "type": "percentileSpectatorHistogram",
+      "name": "medianAdded",
+      "field": {
+        "type": "fieldAccess",
+        "fieldName": "histogram_added"
+      },
+      "percentile": "50.0"
+    }
+  ]
+}
+```
+Results in
+```json
+[
+  {
+    "result": {
+      "histogram_added": {
+        "0": 11096, "1": 632, "2": 297, "3": 187, "4": 322, "5": 161,
+        "6": 174, "7": 127, "8": 125, "9": 162, "10": 123, "11": 106,
+        "12": 95, "13": 104, "14": 95, "15": 588, "16": 540, "17": 690,
+        "18": 719, "19": 478, "20": 288, "21": 250, "22": 219, "23": 224,
+        "24": 737, "25": 424, "26": 343, "27": 266, "28": 232, "29": 217,
+        "30": 171, "31": 164, "32": 161, "33": 530, "34": 339, "35": 236,
+        "36": 181, "37": 152, "38": 113, "39": 128, "40": 80, "41": 75,
+        "42": 289, "43": 145, "44": 138, "45": 83, "46": 45, "47": 46,
+        "48": 64, "49": 65, "50": 71, "51": 421, "52": 525, "53": 59,
+        "54": 31, "55": 35, "56": 8, "57": 10, "58": 5, "59": 4, "60": 11,
+        "61": 10, "62": 5, "63": 2, "64": 2, "65": 1, "67": 1, "68": 1,
+        "69": 1, "70": 1, "71": 1, "78": 2
+      },
+      "medianAdded": 4.013975155279504
+    },
+    "timestamp": "2016-06-27T00:00:00.000Z"
+  }
+]
+```
+
+### Histogram Bucket Boundaries
+These are the upper bounds of each bucket index. There are 276 buckets.
+The first bucket index is 0 and the last bucket index is 275.
+As you can see the bucket widths increase as the bucket index increases. This leads to a greater absolute error for larger values, but maintains a relative error of rough percentage across the number range.
+i.e the maximum error at value 10 is 0 since the bucket width is 1. But for a value of 16,000,000,000 the bucket width is 1,431,655,768 giving an error of up to ~8.9%. In practice, the observed error of computed percentiles is in the range (0.1%, 3%).

Review Comment:
   ```suggestion
   For example, the maximum error at value 10 is zero since the bucket width is 1 (the difference of 11-10). For a value of 16,000,000,000, the bucket width is 1,431,655,768 (17179869184-15748213416). This gives an error of up to ~8.9%. In practice, the observed error of computed percentiles is in the range of (0.1%, 3%).
   ```
   Not sure if it's 11-10 or 10-9. Also it's not clear how you get 8.9%. Consider explaining in more detail, or linking to relevant docs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "bsyk (via GitHub)" <gi...@apache.org>.
bsyk commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1445530223


##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.

Review Comment:
   There are opinions built into the implementation, the main one being that smaller values should be more accurately recorded. The opinion that we can afford to be off by a few million once we're into the many-millions range, but don't want to be off by more than 1 at the very small numbers.
   Happy to try to rephrase.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1446427421


##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider SpectatorHistogram to compute percentile approximations. This extension has a reduced storage footprint compared to the [DataSketches extension](../extensions-core/datasketches-extension.md), which results in smaller segment sizes, faster loading from deep storage, and lower memory usage. This extension provides fast and accurate queries on large datasets at low storage cost.
+
+This aggregator only applies when your raw data contains positive long integer values. Do not use this aggregator if you have negative values in your data.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* `wikipedia` contains the dataset ingested as is, without rollup
+* `wikipedia_spectator` contains the dataset with a single extra metric column of type `spectatorHistogram` for the `added` column
+* `wikipedia_datasketch` contains the dataset with a single extra metric column of type `quantilesDoublesSketch` for the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the `quantilesDoublesSketch`
+adds 48 bytes per row. This represents an eightfold reduction in additional storage size for spectator histograms.
+
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size savings. For example, when you ingest the Wikipedia dataset
+with day-grain query granularity and remove all dimensions except `countryName`,
+this results in a segment that has just 106 rows. The base segment has 87 bytes per row.
+Compare the following bytes per row for SpectatorHistogram versus DataSketches:
+* An additional `spectatorHistogram` column adds 27 bytes per row on average.
+* An additional `quantilesDoublesSketch` column adds 255 bytes per row.
+
+SpectatorHistogram reduces the additional storage size by 9.4 times in this example.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in DataSketches `quantilesDoublesSketch` aggregator, but is
+opinionated and optimized for typical measurements from cloud services and web apps.
+For example, measurements such as page load time, transferred bytes, response time, and request latency.

Review Comment:
   ```suggestion
   It provides similar functionality to the built-in DataSketches `quantilesDoublesSketch` aggregator, but is
   opinionated to maintain higher absolute accuracy at smaller values.
   Larger values have lower absolute accuracy; however, relative accuracy is maintained across the range.
   See [Bucket boundaries](#histogram-bucket-boundaries) for more information.
   The SpectatorHistogram is optimized for typical measurements from cloud services and web apps,
   such as page load time, transferred bytes, response time, and request latency.
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "vtlim (via GitHub)" <gi...@apache.org>.
vtlim commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1445387526


##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements

Review Comment:
   The descriptors here don't need to be in a bulleted list. Consider the following for readability.
   ```suggestion
   Consider SpectatorHistogram to compute percentile approximations. This extension has a reduced storage footprint compared to the [DataSketches extension](../extensions-core/datasketches-extension.md), which results in smaller segment sizes, faster loading from deep storage, and lower memory usage. This extension provides fast and accurate queries on large datasets at low storage cost.
   
   This aggregator only applies when your raw data contains positive long integer values. Do not use this aggregator if you have negative values in your data.
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is

Review Comment:
   Refer to "DataSketch" in line with the docs (here and elsewhere)
   https://druid.apache.org/docs/latest/development/extensions-core/datasketches-extension/



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using

Review Comment:
   ```suggestion
   values as well as aggregating or combining pre-aggregated histograms generated using
   ```
   Use "or" or "and" instead of "/" which can be ambiguous



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.
+
+```json
+{ "4":  8, "5": 15, "6": 37, "7": 9, "8": 3, "10": 1, "13": 1 }
+```
+
+## Loading the extension
+To use SpectatorHistogram, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
+
+```
+druid.extensions.loadList=["druid-spectator-histogram"]
+```
+
+## Aggregators
+
+The result of the aggregation is a histogram that is built by ingesting numeric values from
+the raw data, or from combining pre-aggregated histograms. The result is represented in 
+JSON format where the keys are the bucket index and the values are the count of entries
+in that bucket.
+
+The buckets are defined as per the Spectator [PercentileBuckets](https://github.com/Netflix/spectator/blob/main/spectator-api/src/main/java/com/netflix/spectator/api/histogram/PercentileBuckets.java) specification.
+See [Appendix](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+```js
+  // The set of buckets is generated by using powers of 4 and incrementing by one-third of the
+  // previous power of 4 in between as long as the value is less than the next power of 4 minus
+  // the delta.
+  //
+  // Base: 1, 2, 3
+  //
+  // 4 (4^1), delta = 1 (~1/3 of 4)
+  //     5, 6, 7, ..., 14,
+  //
+  // 16 (4^2), delta = 5 (~1/3 of 16)
+  //    21, 26, 31, ..., 56,
+  //
+  // 64 (4^3), delta = 21 (~1/3 of 64)
+  // ...
+```
+
+There are multiple aggregator types included, all of which are based on the same
+underlying implementation. The different types signal to the Atlas-Druid service (if using)
+how to handle the resulting data from a query.
+
+* spectatorHistogramTimer signals that the histogram is representing
+a collection of timer values. It is recommended to normalize timer values to nanoseconds
+at, or prior to, ingestion. If queried via the Atlas-Druid service, it will
+normalize timers to second resolution at query time as a more natural unit of time
+for human consumption.
+* spectatorHistogram and spectatorHistogramDistribution are generic histograms that
+can be used to represent any measured value without units. No normalization is
+required or performed.
+
+### `spectatorHistogram` aggregator
+Alias: `spectatorHistogramDistribution`, `spectatorHistogramTimer`
+
+To aggregate at query time:
+```
+{
+  "type" : "spectatorHistogram",
+  "name" : <output_name>,
+  "fieldName" : <column_name>
+ }
+```
+
+| Property  | Description                                                                                                  | Required? |
+|-----------|--------------------------------------------------------------------------------------------------------------|-----------|
+| type      | This String must be one of "spectatorHistogram", "spectatorHistogramTimer", "spectatorHistogramDistribution" | yes       |
+| name      | A String for the output (result) name of the aggregation.                                                    | yes       |
+| fieldName | A String for the name of the input field containing raw numeric values or pre-aggregated histograms.         | yes       |
+
+### `longSum`, `doubleSum` and `floatSum` aggregators
+To get the population size (count of events contributing to the histogram):
+```
+{
+  "type" : "longSum",
+  "name" : <output_name>,
+  "fieldName" : <column_name_of_aggregated_histogram>
+ }
+```
+
+| Property  | Description                                                                    | Required? |
+|-----------|--------------------------------------------------------------------------------|-----------|
+| type      | Must be "longSum", "doubleSum", or "floatSum".                                 | yes       |
+| name      | A String for the output (result) name of the aggregation.                      | yes       |
+| fieldName | A String for the name of the input field containing pre-aggregated histograms. | yes       |
+
+## Post Aggregators
+
+### Percentile (singular)
+This returns a single percentile calculation based on the distribution of the values in the aggregated histogram.
+
+```
+{
+  "type": "percentileSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentile": <decimal percentile, e.g. 50.0 for median>
+}
+```
+
+| Property   | Description                                                 | Required? |
+|------------|-------------------------------------------------------------|-----------|
+| type       | This String should always be "percentileSpectatorHistogram" | yes       |
+| name       | A String for the output (result) name of the calculation.   | yes       |
+| field      | A field reference pointing to the aggregated histogram.     | yes       |
+| percentile | A single decimal percentile between 0.0 and 100.0           | yes       |
+
+### Percentiles (multiple)
+This returns an array of percentiles corresponding to those requested.
+
+```
+{
+  "type": "percentilesSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentiles": [25, 50, 75, 99.5]
+}
+```
+
+> Note: It's more efficient to request multiple percentiles in a single query
+than to request individual percentiles in separate queries. This array-based
+helper is provided for convenience and has a marginal performance benefit over
+using the singular percentile post-aggregator multiple times within a query.
+The more expensive part of the query is the aggregation of the histogram.
+The post-aggregation calculations all happen on the same aggregated histogram.
+
+Results will contain arrays matching the length and order of the requested

Review Comment:
   ```suggestion
   The results contain arrays matching the length and order of the requested
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data

Review Comment:
   Capitalize "Wikipedia" when used in text (here and elsewhere)



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.

Review Comment:
   ```suggestion
   adds 48 bytes per row. This represents an eightfold reduction in additional storage size for spectator histograms.
   
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 

Review Comment:
   ```suggestion
   `percentilesSpectatorHistogram` (plural), to compute approximate 
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column

Review Comment:
   ```suggestion
   In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
   * `wikipedia` contains the dataset ingested as is, without rollup
   * `wikipedia_spectator` contains the dataset with a single extra metric column of type `spectatorHistogram` for the `added` column
   * `wikipedia_datasketch` contains the dataset with a single extra metric column of type `quantilesDoublesSketch` for the `added` column
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch

Review Comment:
   ```suggestion
   Spectator histograms average just 6 extra bytes per row, while the DataSketch
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data

Review Comment:
   ```suggestion
   As rollup improves, so does the size savings. For example, when you ingest the Wikipedia dataset
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.

Review Comment:
   ```suggestion
   For example, measurements such as page load time, transferred bytes, response time, and request latency.
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.

Review Comment:
   ```suggestion
   with day-grain query granularity and remove all dimensions except `countryName`,
   this results in a segment that has just 106 rows. The base segment has 87 bytes per row.
   Compare the following bytes per row for SpectatorHistogram versus DataSketches:
   * An additional `spectatorHistogram` column adds 27 bytes per row on average.
   * An additional `quantilesDoublesSketch` column adds 255 bytes per row.
   
   SpectatorHistogram reduces the additional storage size by 9.4 times in this example.
   Storage gains will differ per dataset depending on the variance and rollup of the data.
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.

Review Comment:
   What does "opinionated" mean in this context?



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.

Review Comment:
   ```suggestion
   opinionated and optimized for typical measurements from cloud services and web apps.
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact

Review Comment:
   ```suggestion
   Through some trade-offs SpectatorHistogram provides a significantly more compact
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric

Review Comment:
   ```suggestion
   The SpectatorHistogram aggregator can generate histograms from raw numeric
   ```
   Prefer active voice



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).

Review Comment:
   ```suggestion
   data-sketches. Note that results depend on the dataset.
   Also see the [limitations](#limitations] of this extension.
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.

Review Comment:
   ```suggestion
   :::tip
   If these limitations don't work for your use case, then use [DataSketches](../extensions-core/datasketches-extension.md) instead.
   :::
   ```
   Uses Docusaurus-style admonitions to capture attention https://docusaurus.io/docs/2.x/markdown-features/admonitions



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.

Review Comment:
   ```suggestion
   histogram. The keys need not be ordered or contiguous. For example:
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.
+
+```json
+{ "4":  8, "5": 15, "6": 37, "7": 9, "8": 3, "10": 1, "13": 1 }
+```
+
+## Loading the extension
+To use SpectatorHistogram, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
+
+```
+druid.extensions.loadList=["druid-spectator-histogram"]
+```
+
+## Aggregators
+
+The result of the aggregation is a histogram that is built by ingesting numeric values from
+the raw data, or from combining pre-aggregated histograms. The result is represented in 
+JSON format where the keys are the bucket index and the values are the count of entries
+in that bucket.
+
+The buckets are defined as per the Spectator [PercentileBuckets](https://github.com/Netflix/spectator/blob/main/spectator-api/src/main/java/com/netflix/spectator/api/histogram/PercentileBuckets.java) specification.
+See [Appendix](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+```js
+  // The set of buckets is generated by using powers of 4 and incrementing by one-third of the
+  // previous power of 4 in between as long as the value is less than the next power of 4 minus
+  // the delta.
+  //
+  // Base: 1, 2, 3
+  //
+  // 4 (4^1), delta = 1 (~1/3 of 4)
+  //     5, 6, 7, ..., 14,
+  //
+  // 16 (4^2), delta = 5 (~1/3 of 16)
+  //    21, 26, 31, ..., 56,
+  //
+  // 64 (4^3), delta = 21 (~1/3 of 64)
+  // ...
+```
+
+There are multiple aggregator types included, all of which are based on the same
+underlying implementation. The different types signal to the Atlas-Druid service (if using)
+how to handle the resulting data from a query.

Review Comment:
   ```suggestion
   underlying implementation. If you use the Atlas-Druid service, the different types
   signal the service on how to handle the resulting data from a query.
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.

Review Comment:
   It seems DataSketches also need to be loaded. Did you mean "download"?
   https://druid.apache.org/docs/latest/development/extensions-core/datasketches-extension/
   ```
   druid.extensions.loadList=["druid-datasketches"]
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,

Review Comment:
   ```suggestion
   and accuracy are comparable. However, the DataSketch aggregator supports negative values,
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.

Review Comment:
   ```suggestion
   incorrect to generate histograms from already rolled-up summed data.
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.

Review Comment:
   ```suggestion
   * Supports positive long integer values within the range of [0, 2^53). Negatives are
   coerced to 0.
   * Does not support decimals.
   * Does not support Druid SQL queries, only native queries.
   * Does not support vectorized queries.
   * Generates 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles ranges from 0.1% to 3%, exclusive. See [Bucket boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
   ```
   Start each bullet in the same way (verb or noun)



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.
+
+```json
+{ "4":  8, "5": 15, "6": 37, "7": 9, "8": 3, "10": 1, "13": 1 }
+```
+
+## Loading the extension
+To use SpectatorHistogram, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
+
+```
+druid.extensions.loadList=["druid-spectator-histogram"]
+```
+
+## Aggregators
+
+The result of the aggregation is a histogram that is built by ingesting numeric values from
+the raw data, or from combining pre-aggregated histograms. The result is represented in 
+JSON format where the keys are the bucket index and the values are the count of entries
+in that bucket.
+
+The buckets are defined as per the Spectator [PercentileBuckets](https://github.com/Netflix/spectator/blob/main/spectator-api/src/main/java/com/netflix/spectator/api/histogram/PercentileBuckets.java) specification.
+See [Appendix](#histogram-bucket-boundaries) for the full list of bucket boundaries.

Review Comment:
   ```suggestion
   See [Histogram bucket boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.
+
+```json
+{ "4":  8, "5": 15, "6": 37, "7": 9, "8": 3, "10": 1, "13": 1 }
+```
+
+## Loading the extension
+To use SpectatorHistogram, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
+
+```
+druid.extensions.loadList=["druid-spectator-histogram"]
+```
+
+## Aggregators
+
+The result of the aggregation is a histogram that is built by ingesting numeric values from
+the raw data, or from combining pre-aggregated histograms. The result is represented in 
+JSON format where the keys are the bucket index and the values are the count of entries
+in that bucket.
+
+The buckets are defined as per the Spectator [PercentileBuckets](https://github.com/Netflix/spectator/blob/main/spectator-api/src/main/java/com/netflix/spectator/api/histogram/PercentileBuckets.java) specification.
+See [Appendix](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+```js
+  // The set of buckets is generated by using powers of 4 and incrementing by one-third of the
+  // previous power of 4 in between as long as the value is less than the next power of 4 minus
+  // the delta.
+  //
+  // Base: 1, 2, 3
+  //
+  // 4 (4^1), delta = 1 (~1/3 of 4)
+  //     5, 6, 7, ..., 14,
+  //
+  // 16 (4^2), delta = 5 (~1/3 of 16)
+  //    21, 26, 31, ..., 56,
+  //
+  // 64 (4^3), delta = 21 (~1/3 of 64)
+  // ...
+```
+
+There are multiple aggregator types included, all of which are based on the same
+underlying implementation. The different types signal to the Atlas-Druid service (if using)
+how to handle the resulting data from a query.
+
+* spectatorHistogramTimer signals that the histogram is representing
+a collection of timer values. It is recommended to normalize timer values to nanoseconds
+at, or prior to, ingestion. If queried via the Atlas-Druid service, it will
+normalize timers to second resolution at query time as a more natural unit of time
+for human consumption.
+* spectatorHistogram and spectatorHistogramDistribution are generic histograms that
+can be used to represent any measured value without units. No normalization is
+required or performed.
+
+### `spectatorHistogram` aggregator
+Alias: `spectatorHistogramDistribution`, `spectatorHistogramTimer`
+
+To aggregate at query time:
+```
+{
+  "type" : "spectatorHistogram",
+  "name" : <output_name>,
+  "fieldName" : <column_name>
+ }
+```
+
+| Property  | Description                                                                                                  | Required? |
+|-----------|--------------------------------------------------------------------------------------------------------------|-----------|
+| type      | This String must be one of "spectatorHistogram", "spectatorHistogramTimer", "spectatorHistogramDistribution" | yes       |
+| name      | A String for the output (result) name of the aggregation.                                                    | yes       |
+| fieldName | A String for the name of the input field containing raw numeric values or pre-aggregated histograms.         | yes       |
+
+### `longSum`, `doubleSum` and `floatSum` aggregators
+To get the population size (count of events contributing to the histogram):
+```
+{
+  "type" : "longSum",
+  "name" : <output_name>,
+  "fieldName" : <column_name_of_aggregated_histogram>
+ }
+```
+
+| Property  | Description                                                                    | Required? |
+|-----------|--------------------------------------------------------------------------------|-----------|
+| type      | Must be "longSum", "doubleSum", or "floatSum".                                 | yes       |
+| name      | A String for the output (result) name of the aggregation.                      | yes       |
+| fieldName | A String for the name of the input field containing pre-aggregated histograms. | yes       |
+
+## Post Aggregators
+
+### Percentile (singular)
+This returns a single percentile calculation based on the distribution of the values in the aggregated histogram.
+
+```
+{
+  "type": "percentileSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentile": <decimal percentile, e.g. 50.0 for median>
+}
+```
+
+| Property   | Description                                                 | Required? |
+|------------|-------------------------------------------------------------|-----------|
+| type       | This String should always be "percentileSpectatorHistogram" | yes       |
+| name       | A String for the output (result) name of the calculation.   | yes       |
+| field      | A field reference pointing to the aggregated histogram.     | yes       |
+| percentile | A single decimal percentile between 0.0 and 100.0           | yes       |
+
+### Percentiles (multiple)
+This returns an array of percentiles corresponding to those requested.
+
+```
+{
+  "type": "percentilesSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentiles": [25, 50, 75, 99.5]
+}
+```
+
+> Note: It's more efficient to request multiple percentiles in a single query
+than to request individual percentiles in separate queries. This array-based
+helper is provided for convenience and has a marginal performance benefit over
+using the singular percentile post-aggregator multiple times within a query.
+The more expensive part of the query is the aggregation of the histogram.
+The post-aggregation calculations all happen on the same aggregated histogram.
+
+Results will contain arrays matching the length and order of the requested
+array of percentiles.
+
+```
+"percentilesAdded": [
+    0.5504911679884643, // 25th percentile
+    4.013975155279504,  // 50th percentile 
+    78.89518317503394,  // 75th percentile
+    8580.024999999994   // 99.5th percentile
+]
+```
+
+| Property    | Description                                                  | Required? |
+|-------------|--------------------------------------------------------------|-----------|
+| type        | This String should always be "percentilesSpectatorHistogram" | yes       |
+| name        | A String for the output (result) name of the calculation.    | yes       |
+| field       | A field reference pointing to the aggregated histogram.      | yes       |
+| percentiles | Non-empty array of decimal percentiles between 0.0 and 100.0 | yes       |
+
+## Appendix
+
+### Example Ingestion Spec
+Example of ingesting the sample wikipedia dataset with a histogram metric column:
+```json
+{
+  "type": "index_parallel",
+  "spec": {
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "http",
+        "uris": ["https://druid.apache.org/data/wikipedia.json.gz"]
+      },
+      "inputFormat": { "type": "json" }
+    },
+    "dataSchema": {
+      "granularitySpec": {
+        "segmentGranularity": "day",
+        "queryGranularity": "minute",
+        "rollup": true
+      },
+      "dataSource": "wikipedia",
+      "timestampSpec": { "column": "timestamp", "format": "iso" },
+      "dimensionsSpec": {
+        "dimensions": [
+          "isRobot",
+          "channel",
+          "flags",
+          "isUnpatrolled",
+          "page",
+          "diffUrl",
+          "comment",
+          "isNew",
+          "isMinor",
+          "isAnonymous",
+          "user",
+          "namespace",
+          "cityName",
+          "countryName",
+          "regionIsoCode",
+          "metroCode",
+          "countryIsoCode",
+          "regionName"
+        ]
+      },
+      "metricsSpec": [
+        { "name": "count", "type": "count" },
+        { "name": "sum_added", "type": "longSum", "fieldName": "added" },
+        {
+          "name": "hist_added",
+          "type": "spectatorHistogram",
+          "fieldName": "added"
+        }
+      ]
+    },
+    "tuningConfig": {
+      "type": "index_parallel",
+      "partitionsSpec": { "type": "hashed" },
+      "forceGuaranteedRollup": true
+    }
+  }
+}
+```
+
+### Example Query
+Example query using the sample wikipedia dataset:
+```json
+{
+  "queryType": "timeseries",
+  "dataSource": {
+    "type": "table",
+    "name": "wikipedia"
+  },
+  "intervals": {
+    "type": "intervals",
+    "intervals": [
+      "0000-01-01/9999-12-31"
+    ]
+  },
+  "granularity": {
+    "type": "all"
+  },
+  "aggregations": [
+    {
+      "type": "spectatorHistogram",
+      "name": "histogram_added",
+      "fieldName": "added"
+    }
+  ],
+  "postAggregations": [
+    {
+      "type": "percentileSpectatorHistogram",
+      "name": "medianAdded",
+      "field": {
+        "type": "fieldAccess",
+        "fieldName": "histogram_added"
+      },
+      "percentile": "50.0"
+    }
+  ]
+}
+```
+Results in
+```json
+[
+  {
+    "result": {
+      "histogram_added": {
+        "0": 11096, "1": 632, "2": 297, "3": 187, "4": 322, "5": 161,
+        "6": 174, "7": 127, "8": 125, "9": 162, "10": 123, "11": 106,
+        "12": 95, "13": 104, "14": 95, "15": 588, "16": 540, "17": 690,
+        "18": 719, "19": 478, "20": 288, "21": 250, "22": 219, "23": 224,
+        "24": 737, "25": 424, "26": 343, "27": 266, "28": 232, "29": 217,
+        "30": 171, "31": 164, "32": 161, "33": 530, "34": 339, "35": 236,
+        "36": 181, "37": 152, "38": 113, "39": 128, "40": 80, "41": 75,
+        "42": 289, "43": 145, "44": 138, "45": 83, "46": 45, "47": 46,
+        "48": 64, "49": 65, "50": 71, "51": 421, "52": 525, "53": 59,
+        "54": 31, "55": 35, "56": 8, "57": 10, "58": 5, "59": 4, "60": 11,
+        "61": 10, "62": 5, "63": 2, "64": 2, "65": 1, "67": 1, "68": 1,
+        "69": 1, "70": 1, "71": 1, "78": 2
+      },
+      "medianAdded": 4.013975155279504
+    },
+    "timestamp": "2016-06-27T00:00:00.000Z"
+  }
+]
+```
+
+### Histogram Bucket Boundaries

Review Comment:
   ```suggestion
   ## Histogram bucket boundaries
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.
+
+```json
+{ "4":  8, "5": 15, "6": 37, "7": 9, "8": 3, "10": 1, "13": 1 }
+```
+
+## Loading the extension
+To use SpectatorHistogram, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
+
+```
+druid.extensions.loadList=["druid-spectator-histogram"]
+```
+
+## Aggregators
+
+The result of the aggregation is a histogram that is built by ingesting numeric values from
+the raw data, or from combining pre-aggregated histograms. The result is represented in 
+JSON format where the keys are the bucket index and the values are the count of entries
+in that bucket.
+
+The buckets are defined as per the Spectator [PercentileBuckets](https://github.com/Netflix/spectator/blob/main/spectator-api/src/main/java/com/netflix/spectator/api/histogram/PercentileBuckets.java) specification.
+See [Appendix](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+```js
+  // The set of buckets is generated by using powers of 4 and incrementing by one-third of the
+  // previous power of 4 in between as long as the value is less than the next power of 4 minus
+  // the delta.
+  //
+  // Base: 1, 2, 3
+  //
+  // 4 (4^1), delta = 1 (~1/3 of 4)
+  //     5, 6, 7, ..., 14,
+  //
+  // 16 (4^2), delta = 5 (~1/3 of 16)
+  //    21, 26, 31, ..., 56,
+  //
+  // 64 (4^3), delta = 21 (~1/3 of 64)
+  // ...
+```
+
+There are multiple aggregator types included, all of which are based on the same
+underlying implementation. The different types signal to the Atlas-Druid service (if using)
+how to handle the resulting data from a query.
+
+* spectatorHistogramTimer signals that the histogram is representing
+a collection of timer values. It is recommended to normalize timer values to nanoseconds
+at, or prior to, ingestion. If queried via the Atlas-Druid service, it will
+normalize timers to second resolution at query time as a more natural unit of time
+for human consumption.
+* spectatorHistogram and spectatorHistogramDistribution are generic histograms that
+can be used to represent any measured value without units. No normalization is
+required or performed.
+
+### `spectatorHistogram` aggregator
+Alias: `spectatorHistogramDistribution`, `spectatorHistogramTimer`
+
+To aggregate at query time:
+```
+{
+  "type" : "spectatorHistogram",
+  "name" : <output_name>,
+  "fieldName" : <column_name>
+ }
+```
+
+| Property  | Description                                                                                                  | Required? |
+|-----------|--------------------------------------------------------------------------------------------------------------|-----------|
+| type      | This String must be one of "spectatorHistogram", "spectatorHistogramTimer", "spectatorHistogramDistribution" | yes       |
+| name      | A String for the output (result) name of the aggregation.                                                    | yes       |
+| fieldName | A String for the name of the input field containing raw numeric values or pre-aggregated histograms.         | yes       |
+
+### `longSum`, `doubleSum` and `floatSum` aggregators
+To get the population size (count of events contributing to the histogram):
+```
+{
+  "type" : "longSum",
+  "name" : <output_name>,
+  "fieldName" : <column_name_of_aggregated_histogram>
+ }
+```
+
+| Property  | Description                                                                    | Required? |
+|-----------|--------------------------------------------------------------------------------|-----------|
+| type      | Must be "longSum", "doubleSum", or "floatSum".                                 | yes       |
+| name      | A String for the output (result) name of the aggregation.                      | yes       |
+| fieldName | A String for the name of the input field containing pre-aggregated histograms. | yes       |
+
+## Post Aggregators
+
+### Percentile (singular)
+This returns a single percentile calculation based on the distribution of the values in the aggregated histogram.
+
+```
+{
+  "type": "percentileSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentile": <decimal percentile, e.g. 50.0 for median>
+}
+```
+
+| Property   | Description                                                 | Required? |
+|------------|-------------------------------------------------------------|-----------|
+| type       | This String should always be "percentileSpectatorHistogram" | yes       |
+| name       | A String for the output (result) name of the calculation.   | yes       |
+| field      | A field reference pointing to the aggregated histogram.     | yes       |
+| percentile | A single decimal percentile between 0.0 and 100.0           | yes       |
+
+### Percentiles (multiple)
+This returns an array of percentiles corresponding to those requested.
+
+```
+{
+  "type": "percentilesSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentiles": [25, 50, 75, 99.5]
+}
+```
+
+> Note: It's more efficient to request multiple percentiles in a single query

Review Comment:
   ```suggestion
   > It's more efficient to request multiple percentiles in a single query
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data

Review Comment:
   Prefer "you" instead of "we" in docs. https://github.com/apache/druid/blob/master/docs/development/docs-contribute.md#style-checklist



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.
+
+```json
+{ "4":  8, "5": 15, "6": 37, "7": 9, "8": 3, "10": 1, "13": 1 }
+```
+
+## Loading the extension
+To use SpectatorHistogram, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
+
+```
+druid.extensions.loadList=["druid-spectator-histogram"]
+```
+
+## Aggregators
+
+The result of the aggregation is a histogram that is built by ingesting numeric values from
+the raw data, or from combining pre-aggregated histograms. The result is represented in 
+JSON format where the keys are the bucket index and the values are the count of entries
+in that bucket.
+
+The buckets are defined as per the Spectator [PercentileBuckets](https://github.com/Netflix/spectator/blob/main/spectator-api/src/main/java/com/netflix/spectator/api/histogram/PercentileBuckets.java) specification.
+See [Appendix](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+```js
+  // The set of buckets is generated by using powers of 4 and incrementing by one-third of the
+  // previous power of 4 in between as long as the value is less than the next power of 4 minus
+  // the delta.
+  //
+  // Base: 1, 2, 3
+  //
+  // 4 (4^1), delta = 1 (~1/3 of 4)
+  //     5, 6, 7, ..., 14,
+  //
+  // 16 (4^2), delta = 5 (~1/3 of 16)
+  //    21, 26, 31, ..., 56,
+  //
+  // 64 (4^3), delta = 21 (~1/3 of 64)
+  // ...
+```
+
+There are multiple aggregator types included, all of which are based on the same
+underlying implementation. The different types signal to the Atlas-Druid service (if using)
+how to handle the resulting data from a query.
+
+* spectatorHistogramTimer signals that the histogram is representing
+a collection of timer values. It is recommended to normalize timer values to nanoseconds
+at, or prior to, ingestion. If queried via the Atlas-Druid service, it will
+normalize timers to second resolution at query time as a more natural unit of time
+for human consumption.
+* spectatorHistogram and spectatorHistogramDistribution are generic histograms that
+can be used to represent any measured value without units. No normalization is
+required or performed.
+
+### `spectatorHistogram` aggregator
+Alias: `spectatorHistogramDistribution`, `spectatorHistogramTimer`
+
+To aggregate at query time:
+```
+{
+  "type" : "spectatorHistogram",
+  "name" : <output_name>,
+  "fieldName" : <column_name>
+ }
+```
+
+| Property  | Description                                                                                                  | Required? |
+|-----------|--------------------------------------------------------------------------------------------------------------|-----------|
+| type      | This String must be one of "spectatorHistogram", "spectatorHistogramTimer", "spectatorHistogramDistribution" | yes       |
+| name      | A String for the output (result) name of the aggregation.                                                    | yes       |
+| fieldName | A String for the name of the input field containing raw numeric values or pre-aggregated histograms.         | yes       |
+
+### `longSum`, `doubleSum` and `floatSum` aggregators
+To get the population size (count of events contributing to the histogram):
+```
+{
+  "type" : "longSum",
+  "name" : <output_name>,
+  "fieldName" : <column_name_of_aggregated_histogram>
+ }
+```
+
+| Property  | Description                                                                    | Required? |
+|-----------|--------------------------------------------------------------------------------|-----------|
+| type      | Must be "longSum", "doubleSum", or "floatSum".                                 | yes       |
+| name      | A String for the output (result) name of the aggregation.                      | yes       |
+| fieldName | A String for the name of the input field containing pre-aggregated histograms. | yes       |
+
+## Post Aggregators
+
+### Percentile (singular)
+This returns a single percentile calculation based on the distribution of the values in the aggregated histogram.
+
+```
+{
+  "type": "percentileSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentile": <decimal percentile, e.g. 50.0 for median>
+}
+```
+
+| Property   | Description                                                 | Required? |
+|------------|-------------------------------------------------------------|-----------|
+| type       | This String should always be "percentileSpectatorHistogram" | yes       |
+| name       | A String for the output (result) name of the calculation.   | yes       |
+| field      | A field reference pointing to the aggregated histogram.     | yes       |
+| percentile | A single decimal percentile between 0.0 and 100.0           | yes       |
+
+### Percentiles (multiple)
+This returns an array of percentiles corresponding to those requested.
+
+```
+{
+  "type": "percentilesSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentiles": [25, 50, 75, 99.5]
+}
+```
+
+> Note: It's more efficient to request multiple percentiles in a single query
+than to request individual percentiles in separate queries. This array-based
+helper is provided for convenience and has a marginal performance benefit over
+using the singular percentile post-aggregator multiple times within a query.
+The more expensive part of the query is the aggregation of the histogram.
+The post-aggregation calculations all happen on the same aggregated histogram.
+
+Results will contain arrays matching the length and order of the requested
+array of percentiles.
+
+```
+"percentilesAdded": [
+    0.5504911679884643, // 25th percentile
+    4.013975155279504,  // 50th percentile 
+    78.89518317503394,  // 75th percentile
+    8580.024999999994   // 99.5th percentile
+]
+```
+
+| Property    | Description                                                  | Required? |
+|-------------|--------------------------------------------------------------|-----------|
+| type        | This String should always be "percentilesSpectatorHistogram" | yes       |
+| name        | A String for the output (result) name of the calculation.    | yes       |
+| field       | A field reference pointing to the aggregated histogram.      | yes       |
+| percentiles | Non-empty array of decimal percentiles between 0.0 and 100.0 | yes       |
+
+## Appendix

Review Comment:
   We don't typically include an appendix in docs. Consider making this "Examples" and promoting the heading for bucket boundaries.
   ```suggestion
   ## Examples
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.

Review Comment:
   Add new line before and after figures



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.
+
+```json
+{ "4":  8, "5": 15, "6": 37, "7": 9, "8": 3, "10": 1, "13": 1 }
+```
+
+## Loading the extension
+To use SpectatorHistogram, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
+
+```
+druid.extensions.loadList=["druid-spectator-histogram"]
+```
+
+## Aggregators
+
+The result of the aggregation is a histogram that is built by ingesting numeric values from
+the raw data, or from combining pre-aggregated histograms. The result is represented in 
+JSON format where the keys are the bucket index and the values are the count of entries
+in that bucket.
+
+The buckets are defined as per the Spectator [PercentileBuckets](https://github.com/Netflix/spectator/blob/main/spectator-api/src/main/java/com/netflix/spectator/api/histogram/PercentileBuckets.java) specification.
+See [Appendix](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+```js
+  // The set of buckets is generated by using powers of 4 and incrementing by one-third of the
+  // previous power of 4 in between as long as the value is less than the next power of 4 minus
+  // the delta.
+  //
+  // Base: 1, 2, 3
+  //
+  // 4 (4^1), delta = 1 (~1/3 of 4)
+  //     5, 6, 7, ..., 14,
+  //
+  // 16 (4^2), delta = 5 (~1/3 of 16)
+  //    21, 26, 31, ..., 56,
+  //
+  // 64 (4^3), delta = 21 (~1/3 of 64)
+  // ...
+```
+
+There are multiple aggregator types included, all of which are based on the same
+underlying implementation. The different types signal to the Atlas-Druid service (if using)
+how to handle the resulting data from a query.
+
+* spectatorHistogramTimer signals that the histogram is representing
+a collection of timer values. It is recommended to normalize timer values to nanoseconds
+at, or prior to, ingestion. If queried via the Atlas-Druid service, it will
+normalize timers to second resolution at query time as a more natural unit of time
+for human consumption.
+* spectatorHistogram and spectatorHistogramDistribution are generic histograms that
+can be used to represent any measured value without units. No normalization is
+required or performed.
+
+### `spectatorHistogram` aggregator
+Alias: `spectatorHistogramDistribution`, `spectatorHistogramTimer`
+
+To aggregate at query time:
+```
+{
+  "type" : "spectatorHistogram",
+  "name" : <output_name>,
+  "fieldName" : <column_name>
+ }
+```
+
+| Property  | Description                                                                                                  | Required? |
+|-----------|--------------------------------------------------------------------------------------------------------------|-----------|
+| type      | This String must be one of "spectatorHistogram", "spectatorHistogramTimer", "spectatorHistogramDistribution" | yes       |
+| name      | A String for the output (result) name of the aggregation.                                                    | yes       |
+| fieldName | A String for the name of the input field containing raw numeric values or pre-aggregated histograms.         | yes       |
+
+### `longSum`, `doubleSum` and `floatSum` aggregators
+To get the population size (count of events contributing to the histogram):
+```
+{
+  "type" : "longSum",
+  "name" : <output_name>,
+  "fieldName" : <column_name_of_aggregated_histogram>
+ }
+```
+
+| Property  | Description                                                                    | Required? |
+|-----------|--------------------------------------------------------------------------------|-----------|
+| type      | Must be "longSum", "doubleSum", or "floatSum".                                 | yes       |
+| name      | A String for the output (result) name of the aggregation.                      | yes       |
+| fieldName | A String for the name of the input field containing pre-aggregated histograms. | yes       |
+
+## Post Aggregators
+
+### Percentile (singular)
+This returns a single percentile calculation based on the distribution of the values in the aggregated histogram.
+
+```
+{
+  "type": "percentileSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentile": <decimal percentile, e.g. 50.0 for median>
+}
+```
+
+| Property   | Description                                                 | Required? |
+|------------|-------------------------------------------------------------|-----------|
+| type       | This String should always be "percentileSpectatorHistogram" | yes       |
+| name       | A String for the output (result) name of the calculation.   | yes       |
+| field      | A field reference pointing to the aggregated histogram.     | yes       |
+| percentile | A single decimal percentile between 0.0 and 100.0           | yes       |
+
+### Percentiles (multiple)
+This returns an array of percentiles corresponding to those requested.
+
+```
+{
+  "type": "percentilesSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentiles": [25, 50, 75, 99.5]
+}
+```
+
+> Note: It's more efficient to request multiple percentiles in a single query
+than to request individual percentiles in separate queries. This array-based
+helper is provided for convenience and has a marginal performance benefit over
+using the singular percentile post-aggregator multiple times within a query.
+The more expensive part of the query is the aggregation of the histogram.
+The post-aggregation calculations all happen on the same aggregated histogram.
+
+Results will contain arrays matching the length and order of the requested
+array of percentiles.
+
+```
+"percentilesAdded": [
+    0.5504911679884643, // 25th percentile
+    4.013975155279504,  // 50th percentile 
+    78.89518317503394,  // 75th percentile
+    8580.024999999994   // 99.5th percentile
+]
+```
+
+| Property    | Description                                                  | Required? |
+|-------------|--------------------------------------------------------------|-----------|
+| type        | This String should always be "percentilesSpectatorHistogram" | yes       |
+| name        | A String for the output (result) name of the calculation.    | yes       |
+| field       | A field reference pointing to the aggregated histogram.      | yes       |
+| percentiles | Non-empty array of decimal percentiles between 0.0 and 100.0 | yes       |
+
+## Appendix
+
+### Example Ingestion Spec
+Example of ingesting the sample wikipedia dataset with a histogram metric column:
+```json
+{
+  "type": "index_parallel",
+  "spec": {
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "http",
+        "uris": ["https://druid.apache.org/data/wikipedia.json.gz"]
+      },
+      "inputFormat": { "type": "json" }
+    },
+    "dataSchema": {
+      "granularitySpec": {
+        "segmentGranularity": "day",
+        "queryGranularity": "minute",
+        "rollup": true
+      },
+      "dataSource": "wikipedia",
+      "timestampSpec": { "column": "timestamp", "format": "iso" },
+      "dimensionsSpec": {
+        "dimensions": [
+          "isRobot",
+          "channel",
+          "flags",
+          "isUnpatrolled",
+          "page",
+          "diffUrl",
+          "comment",
+          "isNew",
+          "isMinor",
+          "isAnonymous",
+          "user",
+          "namespace",
+          "cityName",
+          "countryName",
+          "regionIsoCode",
+          "metroCode",
+          "countryIsoCode",
+          "regionName"
+        ]
+      },
+      "metricsSpec": [
+        { "name": "count", "type": "count" },
+        { "name": "sum_added", "type": "longSum", "fieldName": "added" },
+        {
+          "name": "hist_added",
+          "type": "spectatorHistogram",
+          "fieldName": "added"
+        }
+      ]
+    },
+    "tuningConfig": {
+      "type": "index_parallel",
+      "partitionsSpec": { "type": "hashed" },
+      "forceGuaranteedRollup": true
+    }
+  }
+}
+```
+
+### Example Query
+Example query using the sample wikipedia dataset:
+```json
+{
+  "queryType": "timeseries",
+  "dataSource": {
+    "type": "table",
+    "name": "wikipedia"
+  },
+  "intervals": {
+    "type": "intervals",
+    "intervals": [
+      "0000-01-01/9999-12-31"
+    ]
+  },
+  "granularity": {
+    "type": "all"
+  },
+  "aggregations": [
+    {
+      "type": "spectatorHistogram",
+      "name": "histogram_added",
+      "fieldName": "added"
+    }
+  ],
+  "postAggregations": [
+    {
+      "type": "percentileSpectatorHistogram",
+      "name": "medianAdded",
+      "field": {
+        "type": "fieldAccess",
+        "fieldName": "histogram_added"
+      },
+      "percentile": "50.0"
+    }
+  ]
+}
+```
+Results in
+```json
+[
+  {
+    "result": {
+      "histogram_added": {
+        "0": 11096, "1": 632, "2": 297, "3": 187, "4": 322, "5": 161,
+        "6": 174, "7": 127, "8": 125, "9": 162, "10": 123, "11": 106,
+        "12": 95, "13": 104, "14": 95, "15": 588, "16": 540, "17": 690,
+        "18": 719, "19": 478, "20": 288, "21": 250, "22": 219, "23": 224,
+        "24": 737, "25": 424, "26": 343, "27": 266, "28": 232, "29": 217,
+        "30": 171, "31": 164, "32": 161, "33": 530, "34": 339, "35": 236,
+        "36": 181, "37": 152, "38": 113, "39": 128, "40": 80, "41": 75,
+        "42": 289, "43": 145, "44": 138, "45": 83, "46": 45, "47": 46,
+        "48": 64, "49": 65, "50": 71, "51": 421, "52": 525, "53": 59,
+        "54": 31, "55": 35, "56": 8, "57": 10, "58": 5, "59": 4, "60": 11,
+        "61": 10, "62": 5, "63": 2, "64": 2, "65": 1, "67": 1, "68": 1,
+        "69": 1, "70": 1, "71": 1, "78": 2
+      },
+      "medianAdded": 4.013975155279504
+    },
+    "timestamp": "2016-06-27T00:00:00.000Z"
+  }
+]
+```
+
+### Histogram Bucket Boundaries
+These are the upper bounds of each bucket index. There are 276 buckets.

Review Comment:
   ```suggestion
   The following array lists the upper bounds of each bucket index. There are 276 buckets in total.
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.
+
+```json
+{ "4":  8, "5": 15, "6": 37, "7": 9, "8": 3, "10": 1, "13": 1 }
+```
+
+## Loading the extension
+To use SpectatorHistogram, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
+
+```
+druid.extensions.loadList=["druid-spectator-histogram"]
+```
+
+## Aggregators
+
+The result of the aggregation is a histogram that is built by ingesting numeric values from
+the raw data, or from combining pre-aggregated histograms. The result is represented in 
+JSON format where the keys are the bucket index and the values are the count of entries
+in that bucket.
+
+The buckets are defined as per the Spectator [PercentileBuckets](https://github.com/Netflix/spectator/blob/main/spectator-api/src/main/java/com/netflix/spectator/api/histogram/PercentileBuckets.java) specification.
+See [Appendix](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+```js
+  // The set of buckets is generated by using powers of 4 and incrementing by one-third of the
+  // previous power of 4 in between as long as the value is less than the next power of 4 minus
+  // the delta.
+  //
+  // Base: 1, 2, 3
+  //
+  // 4 (4^1), delta = 1 (~1/3 of 4)
+  //     5, 6, 7, ..., 14,
+  //
+  // 16 (4^2), delta = 5 (~1/3 of 16)
+  //    21, 26, 31, ..., 56,
+  //
+  // 64 (4^3), delta = 21 (~1/3 of 64)
+  // ...
+```
+
+There are multiple aggregator types included, all of which are based on the same
+underlying implementation. The different types signal to the Atlas-Druid service (if using)
+how to handle the resulting data from a query.
+
+* spectatorHistogramTimer signals that the histogram is representing
+a collection of timer values. It is recommended to normalize timer values to nanoseconds
+at, or prior to, ingestion. If queried via the Atlas-Druid service, it will
+normalize timers to second resolution at query time as a more natural unit of time
+for human consumption.
+* spectatorHistogram and spectatorHistogramDistribution are generic histograms that
+can be used to represent any measured value without units. No normalization is
+required or performed.
+
+### `spectatorHistogram` aggregator
+Alias: `spectatorHistogramDistribution`, `spectatorHistogramTimer`
+
+To aggregate at query time:
+```
+{
+  "type" : "spectatorHistogram",
+  "name" : <output_name>,
+  "fieldName" : <column_name>
+ }
+```
+
+| Property  | Description                                                                                                  | Required? |
+|-----------|--------------------------------------------------------------------------------------------------------------|-----------|
+| type      | This String must be one of "spectatorHistogram", "spectatorHistogramTimer", "spectatorHistogramDistribution" | yes       |
+| name      | A String for the output (result) name of the aggregation.                                                    | yes       |
+| fieldName | A String for the name of the input field containing raw numeric values or pre-aggregated histograms.         | yes       |
+
+### `longSum`, `doubleSum` and `floatSum` aggregators
+To get the population size (count of events contributing to the histogram):
+```
+{
+  "type" : "longSum",
+  "name" : <output_name>,
+  "fieldName" : <column_name_of_aggregated_histogram>
+ }
+```
+
+| Property  | Description                                                                    | Required? |
+|-----------|--------------------------------------------------------------------------------|-----------|
+| type      | Must be "longSum", "doubleSum", or "floatSum".                                 | yes       |
+| name      | A String for the output (result) name of the aggregation.                      | yes       |
+| fieldName | A String for the name of the input field containing pre-aggregated histograms. | yes       |
+
+## Post Aggregators
+
+### Percentile (singular)
+This returns a single percentile calculation based on the distribution of the values in the aggregated histogram.
+
+```
+{
+  "type": "percentileSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentile": <decimal percentile, e.g. 50.0 for median>
+}
+```
+
+| Property   | Description                                                 | Required? |
+|------------|-------------------------------------------------------------|-----------|
+| type       | This String should always be "percentileSpectatorHistogram" | yes       |
+| name       | A String for the output (result) name of the calculation.   | yes       |
+| field      | A field reference pointing to the aggregated histogram.     | yes       |
+| percentile | A single decimal percentile between 0.0 and 100.0           | yes       |
+
+### Percentiles (multiple)
+This returns an array of percentiles corresponding to those requested.
+
+```
+{
+  "type": "percentilesSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentiles": [25, 50, 75, 99.5]
+}
+```
+
+> Note: It's more efficient to request multiple percentiles in a single query
+than to request individual percentiles in separate queries. This array-based
+helper is provided for convenience and has a marginal performance benefit over
+using the singular percentile post-aggregator multiple times within a query.
+The more expensive part of the query is the aggregation of the histogram.
+The post-aggregation calculations all happen on the same aggregated histogram.
+
+Results will contain arrays matching the length and order of the requested
+array of percentiles.
+
+```
+"percentilesAdded": [
+    0.5504911679884643, // 25th percentile
+    4.013975155279504,  // 50th percentile 
+    78.89518317503394,  // 75th percentile
+    8580.024999999994   // 99.5th percentile
+]
+```
+
+| Property    | Description                                                  | Required? |
+|-------------|--------------------------------------------------------------|-----------|
+| type        | This String should always be "percentilesSpectatorHistogram" | yes       |
+| name        | A String for the output (result) name of the calculation.    | yes       |
+| field       | A field reference pointing to the aggregated histogram.      | yes       |
+| percentiles | Non-empty array of decimal percentiles between 0.0 and 100.0 | yes       |
+
+## Appendix
+
+### Example Ingestion Spec
+Example of ingesting the sample wikipedia dataset with a histogram metric column:
+```json
+{
+  "type": "index_parallel",
+  "spec": {
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "http",
+        "uris": ["https://druid.apache.org/data/wikipedia.json.gz"]
+      },
+      "inputFormat": { "type": "json" }
+    },
+    "dataSchema": {
+      "granularitySpec": {
+        "segmentGranularity": "day",
+        "queryGranularity": "minute",
+        "rollup": true
+      },
+      "dataSource": "wikipedia",
+      "timestampSpec": { "column": "timestamp", "format": "iso" },
+      "dimensionsSpec": {
+        "dimensions": [
+          "isRobot",
+          "channel",
+          "flags",
+          "isUnpatrolled",
+          "page",
+          "diffUrl",
+          "comment",
+          "isNew",
+          "isMinor",
+          "isAnonymous",
+          "user",
+          "namespace",
+          "cityName",
+          "countryName",
+          "regionIsoCode",
+          "metroCode",
+          "countryIsoCode",
+          "regionName"
+        ]
+      },
+      "metricsSpec": [
+        { "name": "count", "type": "count" },
+        { "name": "sum_added", "type": "longSum", "fieldName": "added" },
+        {
+          "name": "hist_added",
+          "type": "spectatorHistogram",
+          "fieldName": "added"
+        }
+      ]
+    },
+    "tuningConfig": {
+      "type": "index_parallel",
+      "partitionsSpec": { "type": "hashed" },
+      "forceGuaranteedRollup": true
+    }
+  }
+}
+```
+
+### Example Query
+Example query using the sample wikipedia dataset:
+```json
+{
+  "queryType": "timeseries",
+  "dataSource": {
+    "type": "table",
+    "name": "wikipedia"
+  },
+  "intervals": {
+    "type": "intervals",
+    "intervals": [
+      "0000-01-01/9999-12-31"
+    ]
+  },
+  "granularity": {
+    "type": "all"
+  },
+  "aggregations": [
+    {
+      "type": "spectatorHistogram",
+      "name": "histogram_added",
+      "fieldName": "added"
+    }
+  ],
+  "postAggregations": [
+    {
+      "type": "percentileSpectatorHistogram",
+      "name": "medianAdded",
+      "field": {
+        "type": "fieldAccess",
+        "fieldName": "histogram_added"
+      },
+      "percentile": "50.0"
+    }
+  ]
+}
+```
+Results in
+```json
+[
+  {
+    "result": {
+      "histogram_added": {
+        "0": 11096, "1": 632, "2": 297, "3": 187, "4": 322, "5": 161,
+        "6": 174, "7": 127, "8": 125, "9": 162, "10": 123, "11": 106,
+        "12": 95, "13": 104, "14": 95, "15": 588, "16": 540, "17": 690,
+        "18": 719, "19": 478, "20": 288, "21": 250, "22": 219, "23": 224,
+        "24": 737, "25": 424, "26": 343, "27": 266, "28": 232, "29": 217,
+        "30": 171, "31": 164, "32": 161, "33": 530, "34": 339, "35": 236,
+        "36": 181, "37": 152, "38": 113, "39": 128, "40": 80, "41": 75,
+        "42": 289, "43": 145, "44": 138, "45": 83, "46": 45, "47": 46,
+        "48": 64, "49": 65, "50": 71, "51": 421, "52": 525, "53": 59,
+        "54": 31, "55": 35, "56": 8, "57": 10, "58": 5, "59": 4, "60": 11,
+        "61": 10, "62": 5, "63": 2, "64": 2, "65": 1, "67": 1, "68": 1,
+        "69": 1, "70": 1, "71": 1, "78": 2
+      },
+      "medianAdded": 4.013975155279504
+    },
+    "timestamp": "2016-06-27T00:00:00.000Z"
+  }
+]
+```
+
+### Histogram Bucket Boundaries
+These are the upper bounds of each bucket index. There are 276 buckets.
+The first bucket index is 0 and the last bucket index is 275.
+As you can see the bucket widths increase as the bucket index increases. This leads to a greater absolute error for larger values, but maintains a relative error of rough percentage across the number range.

Review Comment:
   ```suggestion
   The bucket widths increase as the bucket index increases. This leads to a greater absolute error for larger values, but maintains a relative error of rough percentage across the number range.
   ```



##########
docs/development/extensions-contrib/spectator-histogram.md:
##########
@@ -0,0 +1,453 @@
+---
+id: spectator-histogram
+title: "Spectator Histogram module"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Summary
+This module provides Apache Druid approximate histogram aggregators and percentile
+post-aggregators based on Spectator fixed-bucket histograms.
+
+Consider using this extension if you need percentile approximations and:
+* want fast and accurate queries
+* at a lower storage cost
+* and have a large dataset
+* using only positive measurements
+
+> The main benefit of this extension over data-sketches is the reduced storage
+footprint. Which leads to smaller segment sizes, faster loading from deep storage
+and lower memory usage.
+
+In the Druid instance shown below, the example Wikipedia dataset is loaded 3 times.
+* As-is, no rollup applied
+* With a single extra metric column of type `spectatorHistogram` ingesting the `added` column
+* With a single extra metric column of type `quantilesDoublesSketch` ingesting the `added` column
+
+Spectator histograms average just 6 extra bytes per row, while the data-sketch
+adds 48 bytes per row. This is an 8 x reduction in additional storage size.
+![Comparison of datasource sizes in web console](../../assets/spectator-histogram-size-comparison.png)
+
+As rollup improves, so does the size saving. For example, ingesting the wikipedia data
+with day-grain query granularity and removing all dimensions except `countryName`,
+we get to a segment that has just 106 rows. The base segment is 87 bytes per row,
+adding a single `spectatorHistogram` column adds just 27 bytes per row on average vs
+`quantilesDoublesSketch` adding 255 bytes per row. This is a 9.4 x reduction in additional storage size.
+Storage gains will differ per dataset depending on the variance and rollup of the data.
+
+## Background
+[Spectator](https://netflix.github.io/atlas-docs/spectator/) is a simple library
+for instrumenting code to record dimensional time series data.
+It was built, primarily, to work with [Atlas](https://netflix.github.io/atlas-docs/).
+Atlas was developed by Netflix to manage dimensional time series data for near
+real-time operational insight.
+
+With the [Atlas-Druid](https://github.com/Netflix-Skunkworks/iep-apps/tree/main/atlas-druid)
+service, it's possible to use the power of Atlas queries, backed by Druid as a
+data store to benefit from high-dimensionality and high-cardinality data.
+
+SpectatorHistogram is designed for efficient parallel aggregations while still
+allowing for filtering and grouping by dimensions. 
+It provides similar functionality to the built-in data-sketch aggregator, but is
+opinionated and optimized for typical measurements of cloud services and web-apps.
+Measurements such as page load time, transferred bytes, response time, request latency, etc.
+Through some trade-offs we're able to provide a significantly more compact
+representation with the same aggregation performance and accuracy as
+data-sketches (depending on data-set, see limitations below).
+
+## Limitations
+* Supports positive long integer values within the range of [0, 2^53). Negatives are
+coerced to 0.
+* Decimals are not supported.
+* 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles is in the range (0.1%, 3%). See [Bucket Boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+* DruidSQL queries are yet not supported. You must use native Druid queries.
+* Vectorized queries are yet not supported.
+
+> If any of these limitations are a problem, then the data-sketch aggregator
+is most likely a better choice.
+
+## Functionality
+The SpectatorHistogram aggregator is capable of generating histograms from raw numeric
+values as well as aggregating/combining pre-aggregated histograms generated using
+the SpectatorHistogram aggregator itself.
+While you can generate histograms on the fly at query time, it is generally more
+performant to generate histograms during ingestion and then combine them at
+query time. This is especially true where rollup is enabled. It may be misleading or 
+incorrect to generate histogram from already rolled-up summed data.
+
+The module provides postAggregators, `percentileSpectatorHistogram` (singular) and
+`percentilesSpectatorHistogram` (plural), that can be used to compute approximate 
+percentiles from histograms generated by the SpectatorHistogram aggregator.
+Again, these postAggregators can be used to compute percentiles from raw numeric
+values via the SpectatorHistogram aggregator or from pre-aggregated histograms.
+
+> If you're only using the aggregator to compute percentiles from raw numeric values,
+then you can use the built-in data-sketch aggregator instead. The performance
+and accuracy are comparable, the data-sketch aggregator supports negative values,
+and you don't need to load an additional extension.
+ 
+An aggregated SpectatorHistogram can also be queried using a `longSum` or `doubleSum`
+aggregator to retrieve the population of the histogram. This is effectively the count
+of the number of values that were aggregated into the histogram. This flexibility can
+avoid the need to maintain a separate metric for the count of values.
+
+For high-frequency measurements, you may need to pre-aggregate data at the client prior
+to sending into Druid. For example, if you're measuring individual image render times
+on an image-heavy website, you may want to aggregate the render times for a page-view
+into a single histogram prior to sending to Druid in real-time. This can reduce the
+amount of data that's needed to send from the client across the wire.
+
+SpectatorHistogram supports ingesting pre-aggregated histograms in real-time and batch.
+They can be sent as a JSON map, keyed by the spectator bucket ID and the value is the
+count of values. This is the same format as the serialized JSON representation of the
+histogram. The keys need not be ordered or contiguous e.g.
+
+```json
+{ "4":  8, "5": 15, "6": 37, "7": 9, "8": 3, "10": 1, "13": 1 }
+```
+
+## Loading the extension
+To use SpectatorHistogram, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
+
+```
+druid.extensions.loadList=["druid-spectator-histogram"]
+```
+
+## Aggregators
+
+The result of the aggregation is a histogram that is built by ingesting numeric values from
+the raw data, or from combining pre-aggregated histograms. The result is represented in 
+JSON format where the keys are the bucket index and the values are the count of entries
+in that bucket.
+
+The buckets are defined as per the Spectator [PercentileBuckets](https://github.com/Netflix/spectator/blob/main/spectator-api/src/main/java/com/netflix/spectator/api/histogram/PercentileBuckets.java) specification.
+See [Appendix](#histogram-bucket-boundaries) for the full list of bucket boundaries.
+```js
+  // The set of buckets is generated by using powers of 4 and incrementing by one-third of the
+  // previous power of 4 in between as long as the value is less than the next power of 4 minus
+  // the delta.
+  //
+  // Base: 1, 2, 3
+  //
+  // 4 (4^1), delta = 1 (~1/3 of 4)
+  //     5, 6, 7, ..., 14,
+  //
+  // 16 (4^2), delta = 5 (~1/3 of 16)
+  //    21, 26, 31, ..., 56,
+  //
+  // 64 (4^3), delta = 21 (~1/3 of 64)
+  // ...
+```
+
+There are multiple aggregator types included, all of which are based on the same
+underlying implementation. The different types signal to the Atlas-Druid service (if using)
+how to handle the resulting data from a query.
+
+* spectatorHistogramTimer signals that the histogram is representing
+a collection of timer values. It is recommended to normalize timer values to nanoseconds
+at, or prior to, ingestion. If queried via the Atlas-Druid service, it will
+normalize timers to second resolution at query time as a more natural unit of time
+for human consumption.
+* spectatorHistogram and spectatorHistogramDistribution are generic histograms that
+can be used to represent any measured value without units. No normalization is
+required or performed.
+
+### `spectatorHistogram` aggregator
+Alias: `spectatorHistogramDistribution`, `spectatorHistogramTimer`
+
+To aggregate at query time:
+```
+{
+  "type" : "spectatorHistogram",
+  "name" : <output_name>,
+  "fieldName" : <column_name>
+ }
+```
+
+| Property  | Description                                                                                                  | Required? |
+|-----------|--------------------------------------------------------------------------------------------------------------|-----------|
+| type      | This String must be one of "spectatorHistogram", "spectatorHistogramTimer", "spectatorHistogramDistribution" | yes       |
+| name      | A String for the output (result) name of the aggregation.                                                    | yes       |
+| fieldName | A String for the name of the input field containing raw numeric values or pre-aggregated histograms.         | yes       |
+
+### `longSum`, `doubleSum` and `floatSum` aggregators
+To get the population size (count of events contributing to the histogram):
+```
+{
+  "type" : "longSum",
+  "name" : <output_name>,
+  "fieldName" : <column_name_of_aggregated_histogram>
+ }
+```
+
+| Property  | Description                                                                    | Required? |
+|-----------|--------------------------------------------------------------------------------|-----------|
+| type      | Must be "longSum", "doubleSum", or "floatSum".                                 | yes       |
+| name      | A String for the output (result) name of the aggregation.                      | yes       |
+| fieldName | A String for the name of the input field containing pre-aggregated histograms. | yes       |
+
+## Post Aggregators
+
+### Percentile (singular)
+This returns a single percentile calculation based on the distribution of the values in the aggregated histogram.
+
+```
+{
+  "type": "percentileSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentile": <decimal percentile, e.g. 50.0 for median>
+}
+```
+
+| Property   | Description                                                 | Required? |
+|------------|-------------------------------------------------------------|-----------|
+| type       | This String should always be "percentileSpectatorHistogram" | yes       |
+| name       | A String for the output (result) name of the calculation.   | yes       |
+| field      | A field reference pointing to the aggregated histogram.     | yes       |
+| percentile | A single decimal percentile between 0.0 and 100.0           | yes       |
+
+### Percentiles (multiple)
+This returns an array of percentiles corresponding to those requested.
+
+```
+{
+  "type": "percentilesSpectatorHistogram",
+  "name": <output name>,
+  "field": {
+    "type": "fieldAccess",
+    "fieldName": <name of aggregated SpectatorHistogram>
+  },
+  "percentiles": [25, 50, 75, 99.5]
+}
+```
+
+> Note: It's more efficient to request multiple percentiles in a single query
+than to request individual percentiles in separate queries. This array-based
+helper is provided for convenience and has a marginal performance benefit over
+using the singular percentile post-aggregator multiple times within a query.
+The more expensive part of the query is the aggregation of the histogram.
+The post-aggregation calculations all happen on the same aggregated histogram.
+
+Results will contain arrays matching the length and order of the requested
+array of percentiles.
+
+```
+"percentilesAdded": [
+    0.5504911679884643, // 25th percentile
+    4.013975155279504,  // 50th percentile 
+    78.89518317503394,  // 75th percentile
+    8580.024999999994   // 99.5th percentile
+]
+```
+
+| Property    | Description                                                  | Required? |
+|-------------|--------------------------------------------------------------|-----------|
+| type        | This String should always be "percentilesSpectatorHistogram" | yes       |
+| name        | A String for the output (result) name of the calculation.    | yes       |
+| field       | A field reference pointing to the aggregated histogram.      | yes       |
+| percentiles | Non-empty array of decimal percentiles between 0.0 and 100.0 | yes       |
+
+## Appendix
+
+### Example Ingestion Spec
+Example of ingesting the sample wikipedia dataset with a histogram metric column:
+```json
+{
+  "type": "index_parallel",
+  "spec": {
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "http",
+        "uris": ["https://druid.apache.org/data/wikipedia.json.gz"]
+      },
+      "inputFormat": { "type": "json" }
+    },
+    "dataSchema": {
+      "granularitySpec": {
+        "segmentGranularity": "day",
+        "queryGranularity": "minute",
+        "rollup": true
+      },
+      "dataSource": "wikipedia",
+      "timestampSpec": { "column": "timestamp", "format": "iso" },
+      "dimensionsSpec": {
+        "dimensions": [
+          "isRobot",
+          "channel",
+          "flags",
+          "isUnpatrolled",
+          "page",
+          "diffUrl",
+          "comment",
+          "isNew",
+          "isMinor",
+          "isAnonymous",
+          "user",
+          "namespace",
+          "cityName",
+          "countryName",
+          "regionIsoCode",
+          "metroCode",
+          "countryIsoCode",
+          "regionName"
+        ]
+      },
+      "metricsSpec": [
+        { "name": "count", "type": "count" },
+        { "name": "sum_added", "type": "longSum", "fieldName": "added" },
+        {
+          "name": "hist_added",
+          "type": "spectatorHistogram",
+          "fieldName": "added"
+        }
+      ]
+    },
+    "tuningConfig": {
+      "type": "index_parallel",
+      "partitionsSpec": { "type": "hashed" },
+      "forceGuaranteedRollup": true
+    }
+  }
+}
+```
+
+### Example Query
+Example query using the sample wikipedia dataset:
+```json
+{
+  "queryType": "timeseries",
+  "dataSource": {
+    "type": "table",
+    "name": "wikipedia"
+  },
+  "intervals": {
+    "type": "intervals",
+    "intervals": [
+      "0000-01-01/9999-12-31"
+    ]
+  },
+  "granularity": {
+    "type": "all"
+  },
+  "aggregations": [
+    {
+      "type": "spectatorHistogram",
+      "name": "histogram_added",
+      "fieldName": "added"
+    }
+  ],
+  "postAggregations": [
+    {
+      "type": "percentileSpectatorHistogram",
+      "name": "medianAdded",
+      "field": {
+        "type": "fieldAccess",
+        "fieldName": "histogram_added"
+      },
+      "percentile": "50.0"
+    }
+  ]
+}
+```
+Results in
+```json
+[
+  {
+    "result": {
+      "histogram_added": {
+        "0": 11096, "1": 632, "2": 297, "3": 187, "4": 322, "5": 161,
+        "6": 174, "7": 127, "8": 125, "9": 162, "10": 123, "11": 106,
+        "12": 95, "13": 104, "14": 95, "15": 588, "16": 540, "17": 690,
+        "18": 719, "19": 478, "20": 288, "21": 250, "22": 219, "23": 224,
+        "24": 737, "25": 424, "26": 343, "27": 266, "28": 232, "29": 217,
+        "30": 171, "31": 164, "32": 161, "33": 530, "34": 339, "35": 236,
+        "36": 181, "37": 152, "38": 113, "39": 128, "40": 80, "41": 75,
+        "42": 289, "43": 145, "44": 138, "45": 83, "46": 45, "47": 46,
+        "48": 64, "49": 65, "50": 71, "51": 421, "52": 525, "53": 59,
+        "54": 31, "55": 35, "56": 8, "57": 10, "58": 5, "59": 4, "60": 11,
+        "61": 10, "62": 5, "63": 2, "64": 2, "65": 1, "67": 1, "68": 1,
+        "69": 1, "70": 1, "71": 1, "78": 2
+      },
+      "medianAdded": 4.013975155279504
+    },
+    "timestamp": "2016-06-27T00:00:00.000Z"
+  }
+]
+```
+
+### Histogram Bucket Boundaries
+These are the upper bounds of each bucket index. There are 276 buckets.
+The first bucket index is 0 and the last bucket index is 275.
+As you can see the bucket widths increase as the bucket index increases. This leads to a greater absolute error for larger values, but maintains a relative error of rough percentage across the number range.
+i.e the maximum error at value 10 is 0 since the bucket width is 1. But for a value of 16,000,000,000 the bucket width is 1,431,655,768 giving an error of up to ~8.9%. In practice, the observed error of computed percentiles is in the range (0.1%, 3%).

Review Comment:
   ```suggestion
   For example, the maximum error at value 10 is zero since the bucket width is 1 (the difference of 11-10). For a value of 16,000,000,000, the bucket width is 1,431,655,768 giving an error of up to ~8.9%. In practice, the observed error of computed percentiles is in the range of (0.1%, 3%).
   ```
   Not sure if it's 11-10 or 10-9. Also it's not clear how you get 8.9%. Consider explaining in more detail, or linking to relevant docs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "suneet-s (via GitHub)" <gi...@apache.org>.
suneet-s commented on PR #15340:
URL: https://github.com/apache/druid/pull/15340#issuecomment-1890071905

   @maytasm Good with me to merge once CI is green. Thanks @bsyk for the contribution and patience with the review :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #15340:
URL: https://github.com/apache/druid/pull/15340#discussion_r1450997952


##########
extensions-contrib/spectator-histogram/src/main/java/org/apache/druid/spectator/histogram/SpectatorHistogramPercentilesPostAggregator.java:
##########
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.primitives.Doubles;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.PostAggregator;
+import org.apache.druid.query.aggregation.post.PostAggregatorIds;
+import org.apache.druid.query.cache.CacheKeyBuilder;
+import org.apache.druid.segment.ColumnInspector;
+import org.apache.druid.segment.column.ColumnType;
+
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.Map;
+import java.util.Set;
+
+public class SpectatorHistogramPercentilesPostAggregator implements PostAggregator
+{
+  private final String name;
+  private final PostAggregator field;
+
+  private final double[] percentiles;
+
+  public static final String TYPE_NAME = "percentilesSpectatorHistogram";
+
+  @JsonCreator
+  public SpectatorHistogramPercentilesPostAggregator(
+      @JsonProperty("name") final String name,
+      @JsonProperty("field") final PostAggregator field,
+      @JsonProperty("percentiles") final double[] percentiles
+  )
+  {
+    this.name = Preconditions.checkNotNull(name, "name is null");
+    this.field = Preconditions.checkNotNull(field, "field is null");
+    this.percentiles = Preconditions.checkNotNull(percentiles, "array of fractions is null");
+    Preconditions.checkArgument(this.percentiles.length >= 1, "Array of percentiles cannot " +
+                                                              "be empty");
+  }
+
+  @Override
+  @JsonProperty
+  public String getName()
+  {
+    return name;
+  }
+
+  @Override
+  public ColumnType getType(ColumnInspector signature)
+  {
+    return ColumnType.DOUBLE_ARRAY;
+  }
+
+  @JsonProperty
+  public PostAggregator getField()
+  {
+    return field;
+  }
+
+  @JsonProperty
+  public double[] getPercentiles()
+  {
+    return percentiles;
+  }
+
+  @Override
+  public Object compute(final Map<String, Object> combinedAggregators)
+  {
+    final SpectatorHistogram sketch = (SpectatorHistogram) field.compute(combinedAggregators);
+    return sketch.getPercentileValues(percentiles);
+  }
+
+  @Override
+  public Comparator<Double> getComparator()
+  {
+    return Doubles::compare;

Review Comment:
   this doesn't seem like the correct comparator if the output type is double array (ColumnType has a comparator available `type.getNullableStrategy()` if there might be nulls, or `type.getStrategy()` if not that should work)



##########
extensions-contrib/spectator-histogram/src/main/java/org/apache/druid/spectator/histogram/SpectatorHistogramAggregatorFactory.java:
##########
@@ -0,0 +1,372 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.spectator.histogram;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.druid.query.aggregation.AggregateCombiner;
+import org.apache.druid.query.aggregation.Aggregator;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.AggregatorFactoryNotMergeableException;
+import org.apache.druid.query.aggregation.AggregatorUtil;
+import org.apache.druid.query.aggregation.BufferAggregator;
+import org.apache.druid.query.aggregation.ObjectAggregateCombiner;
+import org.apache.druid.query.cache.CacheKeyBuilder;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.column.ValueType;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+
+@JsonTypeName(SpectatorHistogramAggregatorFactory.TYPE_NAME)
+public class SpectatorHistogramAggregatorFactory extends AggregatorFactory
+{
+  @Nonnull
+  private final String name;
+
+  @Nonnull
+  private final String fieldName;
+
+  @Nonnull
+  private final byte cacheTypeId;
+
+  public static final String TYPE_NAME = "spectatorHistogram";
+
+  @JsonCreator
+  public SpectatorHistogramAggregatorFactory(
+      @JsonProperty("name") final String name,
+      @JsonProperty("fieldName") final String fieldName
+  )
+  {
+    this(name, fieldName, AggregatorUtil.SPECTATOR_HISTOGRAM_CACHE_TYPE_ID);
+  }
+
+  public SpectatorHistogramAggregatorFactory(
+      final String name,
+      final String fieldName,
+      final byte cacheTypeId
+  )
+  {
+    this.name = Objects.requireNonNull(name, "Must have a valid, non-null aggregator name");
+    this.fieldName = Objects.requireNonNull(fieldName, "Parameter fieldName must be specified");
+    this.cacheTypeId = cacheTypeId;
+  }
+
+
+  @Override
+  public byte[] getCacheKey()
+  {
+    return new CacheKeyBuilder(
+        cacheTypeId
+    ).appendString(fieldName).build();
+  }
+
+
+  @Override
+  public Aggregator factorize(ColumnSelectorFactory metricFactory)
+  {
+    return new SpectatorHistogramAggregator(metricFactory.makeColumnValueSelector(fieldName));
+  }
+
+  @Override
+  public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
+  {
+    return new SpectatorHistogramBufferAggregator(metricFactory.makeColumnValueSelector(fieldName));
+  }
+
+  // This is used when writing metrics to segment files to check whether the column is sorted.
+  // Since there is no sensible way really to compare histograms, compareTo always returns 1.
+  public static final Comparator<SpectatorHistogram> COMPARATOR = (o, o1) -> {
+    if (o == null && o1 == null) {
+      return 0;
+    } else if (o != null && o1 == null) {
+      return -1;
+    } else if (o == null) {
+      return 1;
+    }
+    return Integer.compare(o.hashCode(), o1.hashCode());
+  };
+
+  @Override
+  public Comparator getComparator()
+  {
+    return COMPARATOR;
+  }
+
+  @Override
+  public Object combine(@Nullable Object lhs, @Nullable Object rhs)
+  {
+    if (lhs == null) {
+      return rhs;
+    }
+    if (rhs == null) {
+      return lhs;
+    }
+    SpectatorHistogram lhsHisto = (SpectatorHistogram) lhs;
+    SpectatorHistogram rhsHisto = (SpectatorHistogram) rhs;
+    lhsHisto.merge(rhsHisto);
+    return lhsHisto;
+  }
+
+  @Override
+  public AggregatorFactory getCombiningFactory()
+  {
+    return new SpectatorHistogramAggregatorFactory(name, name);
+  }
+
+  @Override
+  public AggregatorFactory getMergingFactory(AggregatorFactory other) throws AggregatorFactoryNotMergeableException
+  {
+    if (other.getName().equals(this.getName()) && this.getClass() == other.getClass()) {
+      return getCombiningFactory();
+    } else {
+      throw new AggregatorFactoryNotMergeableException(this, other);
+    }
+  }
+
+  @Override
+  public List<AggregatorFactory> getRequiredColumns()
+  {
+    return Collections.singletonList(
+        new SpectatorHistogramAggregatorFactory(
+            fieldName,
+            fieldName
+        )
+    );
+  }
+
+  @Override
+  public Object deserialize(Object serializedHistogram)
+  {
+    return SpectatorHistogram.deserialize(serializedHistogram);
+  }
+
+  @Nullable
+  @Override
+  public Object finalizeComputation(@Nullable Object object)
+  {
+    return object;
+  }
+
+  @Override
+  @JsonProperty
+  public String getName()
+  {
+    return name;
+  }
+
+  @JsonProperty
+  public String getFieldName()
+  {
+    return fieldName;
+  }
+
+  @Override
+  public List<String> requiredFields()
+  {
+    return Collections.singletonList(fieldName);
+  }
+
+  @Override
+  public String getComplexTypeName()
+  {
+    return TYPE_NAME;
+  }
+
+  @Override
+  public ValueType getType()
+  {
+    return ValueType.COMPLEX;
+  }
+
+  @Override
+  public ValueType getFinalizedType()
+  {
+    return ValueType.COMPLEX;
+  }

Review Comment:
   these methods are deprecated and will be removed at some point, please implement `getIntermediateType` and `getResultType` instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Add SpectatorHistogram extension (druid)

Posted by "maytasm (via GitHub)" <gi...@apache.org>.
maytasm commented on PR #15340:
URL: https://github.com/apache/druid/pull/15340#issuecomment-1891017570

   @bsyk 
   Thank you for the change!
   Merging this in as CI failure is unrelated to this PR (and master branch is failing on the same failure)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org