You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/09 19:00:34 UTC

[GitHub] [druid] jihoonson opened a new issue #9652: [Draft] 0.18.0 release notes

jihoonson opened a new issue #9652: [Draft] 0.18.0 release notes
URL: https://github.com/apache/druid/issues/9652
 
 
   Apache Druid 0.18.0 contains over 200 new features, performance enhancements, bug fixes, and major documentation improvements from 41 contributors. Check out the [complete list of changes](https://github.com/apache/druid/compare/0.17.0...0.18.0) and [everything tagged to the milestone](https://github.com/apache/druid/milestone/37).
   
   # New Features
   
   ## Join support
   
   Join is a key operation in data analytics. Prior to 0.18.0, Druid supported some join-related features, such as Lookups or semi-joins in SQL. However, their use cases are pretty limited and, for other join use cases, users had to denormalize their datasources when they ingest data instead of joining them at query time which could result in exploding data volume and long ingestion time.
   
   Druid 0.18.0 supports real joins for the first time ever in its history. Druid supports INNER, LEFT, and CROSS joins for now. For native queries, Join datasource has been newly introduced to represent a join of two datasources. Currently, only the left-deep join is allowed. That means, only a table or another join datasource is allowed for the left datasource. For the right datasource, lookup, inline, or query datasources are allowed. Note that join of Druid datasources is not supported yet. There should be only one table datasource in the same join query.
   
   Druid SQL also supports joins. The SQL is internally translated into one or several native queries that include join datasources.
   
   When a join query is issued, the Broker first _evaluates_ all datasources except for the primary datasource which is the only table datasource in the query. The evaluation can include executing subqueries for query datasources. Once the Broker evaluates all non-primary datasources, it replaces them with inline datasources and sends the rewritten query to data nodes (see the below "Query inlining in Brokers" section for more details). Data nodes use the hash join to process join queries. They build a hash table for each non-primary leaf datasource unless it already exists. 
   
   Note that only lookup datasource has a pre-built hash table for now. As a result, the join could be sub-optimized in terms of performance for any other datasource types.
   
   For more details, please see [docs - TBD](TBD).
   
   https://github.com/apache/druid/pull/8728
   https://github.com/apache/druid/pull/9545
   https://github.com/apache/druid/pull/9111
   
   ## Query laning and prioritization
   
   Query laning allows you to control capacity utilization for heterogeneous query workloards. With laning, the broker examines and classifies a query for the purpose of assigning it to a 'lane'. Lanes have capacity limits, enforced by the broker, that can be used to ensure sufficient resources are available for other lanes or for interactive queries (with no lane), or to limit overall throughput for queries within the lane.
   
   Automatic query prioritization determines the query priority based on the configured strategy. Threshold-based prioritization strategy lowers the priority of queries that cross any of a configurable set of thresholds, such as how far in the past the data is, how large of an interval a query covers, or the number of segments taking part in a query.
   
   See https://github.com/apache/druid/blob/0.18.0/docs/configuration/index.md#query-prioritization-and-laning for more details.
   
   https://github.com/apache/druid/pull/6993
   https://github.com/apache/druid/pull/9407
   https://github.com/apache/druid/pull/9493
   
   ## Query inlining in Brokers
   
   Druid is now able to execute a nested query by inlining subqueries. Any type of subquery can be on top of any type of another. The below query could be an example:
   
   ```
                topN
                  |
          (join datasource)
            /          \
   (table datasource)  groupBy
   ```
   
   To execute this query, the Broker first evaluates the leaf groupBy subquery; it sends the subquery to data nodes and collects the result. The collected result is materialized in the Broker memory. Once the Broker collects all result for the groupBy query, it rewrites the topN query by replacing the leaf groupBy with an inline datasource which has the result of the groupBy query. The rewritten query is sent to data nodes to execute the topN query.
   
   ### New dimension in query metrics
   
   Since a native query containing subqueries can be executed part by part, a new "subQueryId" has been introduced. Each subquery has different subQueryIds but same queryId. The subQueryId is available as a new dimension in query metrics.
   
   ### New configuration
   
   A new `druid.server.http.maxSubqueryRows` configuration is to control the max number of rows materialized in the Broker memory.
   
   Please see [docs - TBD](TBD) for more details.
   
   https://github.com/apache/druid/pull/9533
   
   ## SQL grouping sets
   
   GROUPING SETS is now supported to allow you to combine multiple GROUP BY clauses into one GROUP BY clause. This GROUPING SETS clause is internally translated the the groupBy query with subtotalsSpec. The LIMIT clause is now applied after subtotalsSpec rather than applied to each grouping set.
   
   https://github.com/apache/druid/pull/9122
   
   ## SQL Dynamic parameters
   
   Druid now supports dynamic parameters for SQL. To use dynamic parameters, replace any literal in the query with a question mark (`?`) character. These question marks represent the places where the parameters will be bound at execution time. See https://github.com/apache/druid/blob/0.18.0/docs/querying/sql.md#query-syntax for more details.
   
   https://github.com/apache/druid/pull/6974
   
   # Important Changes
   
   ## New lag metrics for Kinesis
   
   Kinesis indexing service now supports new lag metrics as below:
   
   - `ingest/{supervisor type}/lag/time`: total time in millis behind the latest offsets of the stream
   - `ingest/{supervisor type}/maxLag/time`: max time in millis behind the latest offsets of the stream
   - `ingest/{supervisor type}/avgLag/time`: avg time in millis behind the latest offsets of the stream
   
   https://github.com/apache/druid/pull/9509
   
   ## Complex metrics behavior change at ingestion time when SQL-compatible null handling is disabled (default mode)
   
   TBD
   
   https://github.com/apache/druid/pull/9484
   
   ## Roaring bitmaps as default
   
   Druid supports two bitmap types, i.e., Roaring and CONCISE. Since Roaring bitmaps provide a better out-of-box experience (faster query speed in general), the default bitmap type is now switched to Roaring bitmaps. See https://github.com/apache/druid/blob/0.18.0/docs/design/segments.md#compression for more details about bitmaps.
   
   https://github.com/apache/druid/pull/9548
   
   ## Array expression syntax change
   
   TBD
   
   https://github.com/apache/druid/pull/9367
   
   ## Enabling pending segments cleanup by default
   
   The coordinator is now enabled to periodically clean up the `pendingSegments` table in the metadata store by default.
   
   https://github.com/apache/druid/pull/9385
   
   ## Creating better input splits for native parallel indexing
   
   The Parallel task now can create better splits. Each split can contain multiple input files based on their size. Empty files will be ignored. The split size is controllable with Split hint spec. See https://github.com/apache/druid/blob/0.18.0/docs/ingestion/native-batch.md#split-hint-spec for more details.
   
   https://github.com/apache/druid/pull/9360
   https://github.com/apache/druid/pull/9450
   
   ## Including PostgreSQL JDBC driver and Hadoop AWS library in binary distribution
   
   PostgreSQL JDBC driver and Hadoop AWS library are now included in binary distribution for better out-of-box experience.
   
   https://github.com/apache/druid/pull/9399
   
   ## Transform is now an extension point
   
   [`Transform`](https://github.com/apache/druid/blob/0.18.0/processing/src/main/java/org/apache/druid/segment/transform/Transform.java) is an `Interface` that represents a transformation to be applied to each row at ingestino time. This interface is now an Extension point. Please see https://druid.apache.org/docs/0.18.0/development/modules.html#writing-your-own-extensions for how to add your custom Transform.
   
   https://github.com/apache/druid/pull/9319
   
   ## `chunkPeriod` query context is removed
   
   `chunkPeriod` has been deprecated since 0.14.0 because of its limited usage (it is sometimes useful for only groupBy v1). This query context has been removed in 0.18.0.
   
   https://github.com/apache/druid/pull/9216
   
   ## Experimental support for Java 11
   
   Druid now experimentally supports Java 11. Our tests on Travis include:
   
   - Compiling and running unit tests with Java 11
   - Compiling with Java 8 and running integration tests with Java 11
   
   Any kind of performance testing or soak testing results haven't been reported yet.
   
   https://github.com/apache/druid/pull/7306
   https://github.com/apache/druid/pull/9249
   
   # Changes in Extensions
   
   ## New Pac4j extension
   
   A new extension is added in 0.18.0 to enable [OpenID Connect](https://openid.net/connect/) based Authentication for Druid Processes. This can be used  with any authentication server that supports same e.g. [Okta](https://developer.okta.com/). This extension should only be used at the router node to enable a group of users in existing authentication server to interact with Druid cluster, using the [Web Console](../../operations/druid-console.html).
   
   https://github.com/apache/druid/pull/8992
   
   ## Core extension for Azure
   
   The azure storage extension has been promoted to a core extension. It also supports cleanup of stale task logs and segments now. When deploying 0.18.0, please ensure that your `extensions-contrib` directory does not have any older versions of `druid-azure-extensions` extension.
   
   https://github.com/apache/druid/pull/9394
   https://github.com/apache/druid/pull/9523
   
   ## Google Storage extension
   
   The Google storage extension now supports cleanup of stale task logs and segments.
   
   https://github.com/apache/druid/pull/9519
   
   # Security Issues
   
   ## Updating Kafka client to 2.2.2
   
   # Upgrading to Druid 0.18.0
   
   # Known Issues
   
   ## Other known issues
   
   For a full list of open issues, please see https://github.com/apache/druid/labels/Bug.
   
   # Credits
   
   Thanks to everyone who contributed to this release!
   
   @a2l007
   @aditya-r-m
   @AlexanderSaydakov
   @als-sdin
   @aP0StAl
   @asdf2014
   @benhopp
   @bjozet
   @capistrant
   @Caroline1000
   @ccaominh
   @clintropolis
   @dampcake
   @fjy
   @Fokko
   @frnidito
   @gianm
   @himanshug
   @JaeGeunBang
   @jihoonson
   @jon-wei
   @JulianJaffePinterest
   @kou64yama
   @lamber-ken
   @leventov
   @liutang123
   @maytasm
   @mcbrewster
   @mgill25
   @mitchlloyd
   @mrsrinivas
   @nvolungis
   @prabcs
   @samarthjain
   @sthetland
   @suneet-s
   @themaric
   @vogievetsky
   @xvrl
   @zachjsh
   @zhenxiao

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] averma111 commented on issue #9652: [Draft] 0.18.0 release notes

Posted by GitBox <gi...@apache.org>.
averma111 commented on issue #9652: [Draft] 0.18.0 release notes
URL: https://github.com/apache/druid/issues/9652#issuecomment-612292728
 
 
   Great thank you so much for the information I will be upgrade from 0.15 to 0.18 version in production.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #9652: [Draft] 0.18.0 release notes

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #9652: [Draft] 0.18.0 release notes
URL: https://github.com/apache/druid/issues/9652#issuecomment-614299441
 
 
   @SlevinBE thanks for pointing it out! I'll add it in the release notes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] averma111 commented on issue #9652: [Draft] 0.18.0 release notes

Posted by GitBox <gi...@apache.org>.
averma111 commented on issue #9652: [Draft] 0.18.0 release notes
URL: https://github.com/apache/druid/issues/9652#issuecomment-612283508
 
 
   @a2l007 when is  next release 0.18 is coming for general use

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] SlevinBE commented on issue #9652: [Draft] 0.18.0 release notes

Posted by GitBox <gi...@apache.org>.
SlevinBE commented on issue #9652: [Draft] 0.18.0 release notes
URL: https://github.com/apache/druid/issues/9652#issuecomment-613845495
 
 
   This looks like a great release, looking forward to it! 👍 
   Also, when mentioning https://github.com/apache/druid/pull/9523 and https://github.com/apache/druid/pull/9519 in the release notes, it is probably also worth mentioning the S3 support to cleanup task logs and segments (https://github.com/apache/druid/pull/9459)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #9652: [Draft] 0.18.0 release notes

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #9652: [Draft] 0.18.0 release notes
URL: https://github.com/apache/druid/issues/9652#issuecomment-612291068
 
 
   Hi @averma111, you can check remaining issues for 0.18.0 [here](https://github.com/apache/druid/milestone/37). Once all issues are closed, I'll create the RC1 for release vote. The vote will be open for at least 72 hours. The release will be done if the vote passes. See https://github.com/apache/druid/blob/master/distribution/asf-release-process-guide.md for more details.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org