You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/10/02 04:41:48 UTC

[GitHub] [druid] jon-wei opened a new issue #10462: [WIP] 0.20.0 Release Notes

jon-wei opened a new issue #10462:
URL: https://github.com/apache/druid/issues/10462


   Apache Druid 0.20.0 contains around 140 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 36 contributors. Refer to the [complete list of changes](https://github.com/apache/druid/compare/0.19.0...0.20.0) and [everything tagged to the milestone](https://github.com/apache/druid/milestone/40) for further details.
   
   # <a name="20-new-features" href="#20-new-features">#</a> New Features
   
   ## <a name="20-hash-segment-pruning" href="#20-hash-segment-pruning">#</a> Query segment pruning with hash partitioning
   
   Druid now supports query-time segment pruning (excluding certain segments as read candidates for a query) for hash partitioned segments. This optimization applies when all of the `partitionDimensions` specified in the hash partition spec during ingestion time are present in the filter set of a query, and the filters in the query filter on discrete values of the `partitionDimensions` (e.g., selector filters). Segment pruning with hash partitioning is not supported with non-discrete filters such as bound filters.
   
   For existing users with existing segments, you will need to reingest those segments to take advantage of this new feature, as the segment pruning requires a `partitionFunction` to be stored together with the segments, which does not exist in segments created by older versions of Druid. It is not necessary to specify the `partitionFunction` explicitly, as the default is the same partition function that was used in prior versions of Druid.
   
   Note that segments created with a default `partitionDimensions` value (partition by all dimensions + the time column) cannot be pruned in this manner, the segments need to be created with an explicit `partitionDimensions`.
   
   https://github.com/apache/druid/pull/9810
   https://github.com/apache/druid/pull/10288
   
   ## <a name="20-cluster-wide-default-query-context" href="#20-cluster-wide-default-query-context">#</a> Cluster-wide default query context settings
   
   It is now possible to set cluster-wide default query context properties by adding a configuration of the form `druid.query.override.default.context.*`, with `*` replaced by the property name.
   
   https://github.com/apache/druid/pull/10208
   
   ## <a name="20-improved-retention-rules-ui" href="#20-improved-retention-rules-ui">#</a> Improved retention rules UI
   
   The retention rules UI in the web console has been improved. It now provides suggestions and basic validation in the period dropdown, shows the cluster default rules, and makes editing the default rules more accessible.
   
   https://github.com/apache/druid/pull/10226
   
   ## <a name="20-groupby-offset" href="#20-groupby-offset">#</a> `offset` parameter for GroupBy and Scan queries
   
   It is now possible set an `offset` parameter for GroupBy and Scan queries, which tells Druid to skip a number of rows when returning results. Please see https://druid.apache.org/docs/latest/querying/limitspec.html and https://druid.apache.org/docs/latest/querying/scan-query.html for details.
   
   https://github.com/apache/druid/pull/10235
   https://github.com/apache/druid/pull/10233
   
   ## <a name="20-sql-offset" href="#20-sql-offset">#</a> `OFFSET` clause for SQL queries
   
   Druid SQL queries now support an `OFFSET` clause. Please see https://druid.apache.org/docs/latest/querying/sql.html#offset for details.
   
   https://github.com/apache/druid/pull/10279 
   
   ## <a name="20-sql-contains" href="#20-sql-contains">#</a> Substring search operators
   
   Druid has added new substring search operators in its expression language and for SQL queries.
   
   Please see documentation for `CONTAINS_STRING` and `ICONTAINS_STRING` string functions for Druid SQL (https://druid.apache.org/docs/latest/querying/sql.html#string-functions) and documentation for `contains_string` and `icontains_string` for the Druid expression language (https://druid.apache.org/docs/latest/misc/math-expr.html#string-functions).
   
   https://github.com/apache/druid/pull/10350
   
   ## <a name="20-sql-union-all" href="#20-sql-union-all">#</a> UNION ALL operator for SQL queries
   
   Druid SQL queries now support the `UNION ALL` operator, which fuses the results of multiple queries together. Please see https://druid.apache.org/docs/latest/querying/sql.html#union-all for details on what query shapes are supported by this operator.
   
   https://github.com/apache/druid/pull/10324
   
   ## <a name="20-vectorized-min-max" href="#20-vectorized-min-max">#</a> Vectorization support for long, double, float min & max aggregators
   
   Vectorization support has been added for several aggregation types: numeric min/max aggregators, variance aggregators, ANY aggregators, and aggregators from the `druid-histogram` extension.
   
   https://github.com/apache/druid/pull/10260 - numeric min/max
   https://github.com/apache/druid/pull/10304 - histogram
   https://github.com/apache/druid/pull/10338 - ANY
   https://github.com/apache/druid/pull/10390 - variance
   
   ## <a name="20-vectorized-virtual-columns" href="#20-vectorized-virtual-columns">#</a> Vectorization support for expression virtual columns
   
   Expression virtual columns now have vectorization support (depending on the expressions being used), which an results in a 3-5x performance improvement in some cases. 
   
   Please see https://druid.apache.org/docs/latest/misc/math-expr.html#vectorization-support for details on the specific expressions that support vectorization, and https://druid.apache.org/docs/latest/querying/query-context.html#vectorization-parameters for more information on query context parameters that control vectorization.
   
   https://github.com/apache/druid/pull/10388
   https://github.com/apache/druid/pull/10401
   https://github.com/apache/druid/pull/10432
   
   ## <a name="20-split-hint-max-files" href="#20-split-hint-max-files">#</a> Subtask file count limits for parallel batch ingestion
   
   The size-based `splitHintSpec` now supports a new `maxNumFiles` parameter, which limits how many files can be assigned to individual subtasks in parallel batch ingestion. 
   
   The segment-based `splitHintSpec` used for reingesting data from existing Druid segments also has a new `maxNumSegments` parameter which functions similarly.
   
   Please see https://druid.apache.org/docs/latest/ingestion/native-batch.html#split-hint-spec for more details.
   
   https://github.com/apache/druid/pull/10243
   
   ## <a name="20-redis-extension" href="#20-redis-extension">#</a> Redis cache extension enhancements
   
   The Redis cache extension now supports Redis Cluster, selecting which database is used, connecting to password-protected servers, and period-style configurations for the `expiration` and `timeout` properties.
   
   https://github.com/apache/druid/pull/10240
   
   ## <a name="20-auto-compaction-partition" href="#20-auto-compaction-partition">#</a> Support for all partitioning schemes for auto-compaction
   
   A partitioning spec can now be defined for auto-compaction, allowing users to repartition their data at compaction time. Please see the documentation for the new `partitionsSpec` property in the compaction `tuningConfig` for more details: https://druid.apache.org/docs/latest/configuration/index.html#compaction-tuningconfig
   
   https://druid.apache.org/docs/latest/configuration/index.html#compaction-tuningconfig
   
   https://github.com/apache/druid/pull/10307
   
   ## <a name="20-combining-input-source" href="#20-combining-input-source">#</a> Combining InputSource
   
   A new combining InputSource has been added, allowing the user to combine multiple input sources during ingestion. Please see https://druid.apache.org/docs/latest/ingestion/native-batch.html#combining-input-source for more details.
   
   https://github.com/apache/druid/pull/10387
   
   ## <a name="20-autocompaction-status-api" href="#20-autocompaction-status-api">#</a> Auto-compaction status API
   
   A new coordinator API which shows the status of auto-compaction for a datasource has been added. The new API shows whether auto-compaction is enabled for a datasource, and a summary of how far compaction has progressed. 
   
   The web console has also been updated to show this information:
   
   https://user-images.githubusercontent.com/177816/94326243-9d07e780-ff57-11ea-9f80-256fa08580f0.png
   
   TBD: pending docs for this feature, will link when available
   
   https://github.com/apache/druid/pull/10371
   https://github.com/apache/druid/pull/10438
   
   ## <a name="20-auto-num-shards" href="#20-auto-num-shards">#</a> Automatically determine numShards for parallel ingestion hash partitioning
   
   When hash partitioning is used in parallel batch ingestion, it is no longer necessary to specify `numShards` in the partition spec. Druid can now automatically determine a number of shards by scanning the data in a new ingestion phase that determines the cardinalities of the partitioning key.
   
   https://github.com/apache/druid/pull/10419
   
   ## <a name="20-task-slot-metrics" href="#20-task-slot-metrics">#</a> Task slot usage metrics
   
   New task slot usage metrics have been added. Please see the entries for the `taskSlot` metrics at https://druid.apache.org/docs/latest/operations/metrics.html#indexing-service for more details.
   
   https://github.com/apache/druid/pull/10379
   
   ## <a name="20-disable-server-version" href="#20-disable-server-version">#</a> Disable sending server version in response headers
   
   It is now possible to disable sending of server version information in Druid's response headers.
   
   This is controlled by a new property `druid.server.http.sendServerVersion`, which defaults to `true`.
   
   https://github.com/apache/druid/pull/9832
   
   # <a name="20-bugs" href="#20-bugs">#</a> Bug fixes
   
   ## <a name="20-auto-num-shards" href="#20-auto-num-shards">#</a> Fix query correctness issue when historical has no segment timeline
   
   Druid 0.20.0 fixes a query correctness issue when a broker issues a query expecting a historical to have certain segments for a datasource, but the historical when queried does not actually have any segments for that datasource (e.g., they were all unloaded before the historical processed the query). Prior to 0.20.0, the query would return successfully but without the results from the segments that were missing in the manner described previously. In 0.20.0, queries will now fail in such situations.
   
   https://github.com/apache/druid/pull/10199
   
   ## <a name="20-result-caching" href="#20-result-caching">#</a> Fix issue preventing result-level cache from being populated
   
   Druid 0.20.0 fixes an issue introduced in 0.19.0 (https://github.com/apache/druid/issues/10337) which can prevent query caches from being populated when result-level caching is enabled.
   
   https://github.com/apache/druid/pull/10341
   
   ## <a name="20-variance-comparator" href="#20-variance-comparator">#</a> Fix for variance aggregator ordering
   
   The variance aggregator previously used an incorrect comparator that compared using an aggregator's internal `count` variable instead of the variance.
   
   https://github.com/apache/druid/pull/10340
   
   ## <a name="20-limitspec-cache" href="#20-limitspec-cache">#</a> Fix incorrect caching for groupBy queries with limit specs
   
   Druid 0.20.0 fixes an issues with groupBy queries and caching, where the limitSpec of the query was not considered in the cache key, leading to potentially incorrect results if queries that are identical except for the limitSpec are issued.
   
   https://github.com/apache/druid/pull/10093
   
   # <a name="20-upgrading-from-previous" href="#20-upgrading-from-previous">#</a> Upgrading to Druid 0.20.0
   
   Please be aware of the following considerations when upgrading from 0.19.0 to 0.20.0. If you're updating from an earlier version than 0.19.0, please see the release notes of the relevant intermediate versions.
   
   ## <a name="20-default-max-size" href="#20-default-max-size">#</a> Default `maxSize`
   
   `druid.server.maxSize` will now default to the sum of `maxSize` values defined within the `druid.segmentCache.locations`. The user can still provide a custom value for `druid.server.maxSize` which will take precedence over the default value.
   
   https://github.com/apache/druid/pull/10255
   
   ## <a name="20-id-name-change" href="#20-id-name-change">#</a> Compaction and kill task ID changes
   
   Compaction and kill tasks issued by the coordinator will now have their task IDs prefixed by `coordinator-issued`, while user-issued kill tasks will be prefixed by `api-issued`.
   
   https://github.com/apache/druid/pull/10278
   
   ## <a name="20-new-size-limit-split" href="#20-new-size-limit-split">#</a> New size limits for parallel ingestion split hint specs
   
   The size-based and segment-based `splitHintSpec` for parallel batch ingestion now apply a default file/segment limit of 1000 per subtask, controlled by the `maxNumFiles` and `maxNumSegments` respectively. 
   
   https://github.com/apache/druid/pull/10243
   
   ## <a name="20-new-agg-methods" href="#20-new-agg-methods">#</a> New `PostAggregator` and `AggregatorFactory` methods
   
   Users who have developed an extension with custom `PostAggregator` or `AggregatorFactory` implementions will need to update their extensions, as these two interfaces have new methods defined in 0.20.0. 
   
   `PostAggregator` now has a new method:
   
   ```
     ValueType getType();
   ```
   
   To support type information on `PostAggregator`, `AggregatorFactory` also has 2 new methods:
   
   ```
     public abstract ValueType getType();
   
     public abstract ValueType getFinalizedType();
   ```
   ## <a name="20-new-expr-methods" href="#20-new-expr-methods">#</a> New `Expr`-related methods
   
   Users who have developed an extension with custom `Expr` implementions will need to update their extensions, as `Expr` and related interfaces hae changed in 0.20.0. Please see the PR below for details:
   
   https://github.com/apache/druid/pull/10401
   
   Please see https://github.com/apache/druid/pull/9638 for more details on the interface changes.
   
   ## <a name="20-sequence-time" href="#20-sequence-time">#</a> More accurate `query/cpu/time` metric
   
   In 0.20.0, the accuracy of the `query/cpu/time` metric has been improved. Previously, it did not account for certain portions of work during query processing, described in more detail in the following PR: 
   
   https://github.com/apache/druid/pull/10377
   
   ## <a name="20-audit-log-cols" href="#20-audit-log-cols">#</a> New audit log service metric columns
   
   If you are using audit logging, please be aware that new columns have been added to the audit log service metric (`comment`, `remote_address`, and `created_date`). An optional `payload` column has also been added, which can be enabled by setting `druid.audit.manager.includePayloadAsDimensionInMetric` to `true`.
   
   https://github.com/apache/druid/pull/10373
   
   ## <a name="20-request-log-sql-context" href="#20-request-log-sql-context">#</a> `sqlQueryContext` in request logs
   
   If you are using query request logging, the request log events will now include the `sqlQueryContext` for SQL queries.
   
   https://github.com/apache/druid/pull/10368
   
   ## <a name="20-last-compaction-state" href="#20-last-compaction-state">#</a> Additional per-segment state in metadata store
   
   Hash-partitioned segments created by Druid 0.20.0 will now have additional `partitionFunction` data in the metadata store.
   
   Additionally, compaction tasks will now store additional per-segment information in the metadata store, used for tracking compaction history.
   
   https://github.com/apache/druid/pull/10288
   https://github.com/apache/druid/pull/10413
   
   # <a name="20-credits" href="#20-credits">#</a> Credits
   
   Thanks to everyone who contributed to this release!
   
   @a2l007
   @abhishekagarwal87
   @abhishekrb19
   @ArvinZheng
   @belugabehr
   @capistrant
   @ccaominh
   @clintropolis
   @code-crusher
   @dylwylie
   @fermelone
   @FrankChen021
   @gianm
   @himanshug
   @jihoonson
   @jon-wei
   @joykent99
   @kroeders
   @lightghli
   @mans2singh
   @maytasm
   @medb
   @mghosh4
   @nishantmonu51
   @pan3793
   @richardstartin
   @sthetland
   @suneet-s
   @tarunparackal
   @tdt17
   @tourvi
   @vogievetsky
   @wjhypo
   @xiangqiao123
   @xvrl
   
   
   ---
   
   TBD: are there any breaking changes from
   https://github.com/apache/druid/pull/10203
   https://github.com/apache/druid/pull/9810
   https://github.com/apache/druid/pull/10307


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on issue #10462: [WIP] 0.20.0 Release Notes

Posted by GitBox <gi...@apache.org>.
jon-wei commented on issue #10462:
URL: https://github.com/apache/druid/issues/10462#issuecomment-707414818


   @asdf2014 Sounds good, I've added that to the Other features section


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] 2bethere commented on issue #10462: [WIP] 0.20.0 Release Notes

Posted by GitBox <gi...@apache.org>.
2bethere commented on issue #10462:
URL: https://github.com/apache/druid/issues/10462#issuecomment-702527317


   I suggest group things by
   1. Ingestion
   2. Query
   3. Compaction
   4. UI
   5. Extensions
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ccaominh commented on issue #10462: [WIP] 0.20.0 Release Notes

Posted by GitBox <gi...@apache.org>.
ccaominh commented on issue #10462:
URL: https://github.com/apache/druid/issues/10462#issuecomment-702934080


   For the docs links, pointing to `0.20.0` instead of `latest` will ensure the links are accurate later after future versions are released.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on issue #10462: [WIP] 0.20.0 Release Notes

Posted by GitBox <gi...@apache.org>.
jon-wei commented on issue #10462:
URL: https://github.com/apache/druid/issues/10462#issuecomment-706079665


   @bavaria95 Sure, I've added that to the bug fix section.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson closed issue #10462: [DRAFT] 0.20.0 Release Notes

Posted by GitBox <gi...@apache.org>.
jihoonson closed issue #10462:
URL: https://github.com/apache/druid/issues/10462


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on issue #10462: [WIP] 0.20.0 Release Notes

Posted by GitBox <gi...@apache.org>.
jon-wei commented on issue #10462:
URL: https://github.com/apache/druid/issues/10462#issuecomment-706079665


   @bavaria95 Sure, I've added that to the bug fix section.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] bavaria95 commented on issue #10462: [WIP] 0.20.0 Release Notes

Posted by GitBox <gi...@apache.org>.
bavaria95 commented on issue #10462:
URL: https://github.com/apache/druid/issues/10462#issuecomment-704325325


   I was wondering whether it makes sense to mention the fix of stringFirst/stringLast rollup during ingestion (#10332). I think it's quite an important fix
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] asdf2014 commented on issue #10462: [WIP] 0.20.0 Release Notes

Posted by GitBox <gi...@apache.org>.
asdf2014 commented on issue #10462:
URL: https://github.com/apache/druid/issues/10462#issuecomment-706845964


   Hi, @jon-wei . Thank you for organizing this release, it's very exciting. In addition, #10203 has changed the user-oriented configuration behavior, so it should be necessary to be explained in this Release Note.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org