You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by fj...@apache.org on 2019/05/06 19:31:56 UTC

[incubator-druid] branch master updated: Remove SQL experimental banner and other doc adjustments. (#7591)

This is an automated email from the ASF dual-hosted git repository.

fjy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-druid.git


The following commit(s) were added to refs/heads/master by this push:
     new 727b65c  Remove SQL experimental banner and other doc adjustments. (#7591)
727b65c is described below

commit 727b65c7e5ce536daf2fff58372228d80fd61078
Author: Gian Merlino <gi...@imply.io>
AuthorDate: Mon May 6 12:31:51 2019 -0700

    Remove SQL experimental banner and other doc adjustments. (#7591)
    
    * Remove SQL experimental banner and other doc adjustments.
    
    Also,
    
    - Adjust the ToC and other docs a bit so SQL and native queries are
      presented on more equal footing.
    - De-emphasize querying historicals and peons directly in the
      native query docs. This is a really niche thing and may have been
      confusing to include prominently in the very first paragraph.
    - Remove DataSketches and Kafka indexing service from the experimental
      features ToC. They are not experimental any longer and were there in
      error.
    
    * More notes.
    
    * Slight tweak.
    
    * Remove extra extra word.
    
    * Remove RT node from ToC.
---
 docs/content/development/experimental.md | 19 +++++-----
 docs/content/development/router.md       |  5 +++
 docs/content/querying/aggregations.md    | 17 ++++-----
 docs/content/querying/lookups.md         | 11 ++++++
 docs/content/querying/querying.md        | 33 +++++++++-------
 docs/content/querying/select-query.md    | 16 ++++----
 docs/content/querying/sql.md             | 12 +++---
 docs/content/toc.md                      | 64 +++++++++++++++-----------------
 8 files changed, 99 insertions(+), 78 deletions(-)

diff --git a/docs/content/development/experimental.md b/docs/content/development/experimental.md
index adf4e24..eb3c051 100644
--- a/docs/content/development/experimental.md
+++ b/docs/content/development/experimental.md
@@ -24,16 +24,15 @@ title: "Experimental Features"
 
 # Experimental Features
 
-Experimental features are features we have developed but have not fully tested in a production environment. If you choose to try them out, there will likely be edge cases that we have not covered. We would love feedback on any of these features, whether they are bug reports, suggestions for improvement, or letting us know they work as intended.
+Features often start out in "experimental" status that indicates they are still evolving.
+This can mean any of the following things:
 
-<div class="note caution">
-APIs for experimental features may change in backwards incompatible ways.
-</div>
+1. The feature's API may change even in minor releases or patch releases.
+2. The feature may have known "missing" pieces that will be added later.
+3. The feature may or may not have received full battle-testing in production environments.
 
-To enable experimental features, include their artifacts in the configuration runtime.properties file, e.g.,
+All experimental features are optional.
 
-```
-druid.extensions.loadList=["druid-histogram"]
-```
-
-The configuration files for all the Apache Druid (incubating) processes need to be updated with this.
+Note that not all of these points apply to every experimental feature. Some have been battle-tested in terms of
+implementation, but are still marked experimental due to an evolving API. Please check the documentation for each
+feature for full details.
diff --git a/docs/content/development/router.md b/docs/content/development/router.md
index 3c8f3b7..11508ac 100644
--- a/docs/content/development/router.md
+++ b/docs/content/development/router.md
@@ -24,6 +24,11 @@ title: "Router Process"
 
 # Router Process
 
+<div class="note info">
+The Router is an optional and <a href="../development/experimental.html">experimental</a> feature due to the fact that its recommended place in the Druid cluster architecture is still evolving.
+However, it has been battle-tested in production, and it hosts the powerful [Druid Console](../operations/management-uis.html#druid-console), so you should feel safe deploying it.
+</div>
+
 The Apache Druid (incubating) Router process can be used to route queries to different Broker processes. By default, the broker routes queries based on how [Rules](../operations/rule-configuration.html) are set up. For example, if 1 month of recent data is loaded into a `hot` cluster, queries that fall within the recent month can be routed to a dedicated set of brokers. Queries outside this range are routed to another set of brokers. This set up provides query isolation such that queries [...]
 
 For query routing purposes, you should only ever need the Router process if you have a Druid cluster well into the terabyte range. 
diff --git a/docs/content/querying/aggregations.md b/docs/content/querying/aggregations.md
index 23b333f..a204720 100644
--- a/docs/content/querying/aggregations.md
+++ b/docs/content/querying/aggregations.md
@@ -279,21 +279,19 @@ The [DataSketches HLL Sketch](../development/extensions-core/datasketches-hll.ht
 
 Compared to the Theta sketch, the HLL sketch does not support set operations and has slightly slower update and merge speed, but requires significantly less space.
 
-#### Cardinality/HyperUnique (Deprecated)
+#### Cardinality, hyperUnique
 
-<div class="note caution">
-The Cardinality and HyperUnique aggregators are deprecated.
+<div class="note info">
 For new use cases, we recommend evaluating <a href="../development/extensions-core/datasketches-theta.html">DataSketches Theta Sketch</a> or <a href="../development/extensions-core/datasketches-hll.html">DataSketches HLL Sketch</a> instead.
-For existing users, we recommend evaluating the newer DataSketches aggregators and migrating if possible.
+The DataSketches aggregators are generally able to offer more flexibility and better accuracy than the classic Druid `cardinality` and `hyperUnique` aggregators.
 </div>
 
 The [Cardinality and HyperUnique](../querying/hll-old.html) aggregators are older aggregator implementations available by default in Druid that also provide distinct count estimates using the HyperLogLog algorithm. The newer DataSketches Theta and HLL extension-provided aggregators described above have superior accuracy and performance and are recommended instead. 
 
-The DataSketches team has published a [comparison study](https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html) between Druid's original HLL algorithm and the DataSketches HLL algorithm. Based on the demonstrated advantages of the DataSketches implementation, we have deprecated Druid's original HLL aggregator.
-
-Please note that `hyperUnique` aggregators are not mutually compatible with Datasketches HLL or Theta sketches. 
+The DataSketches team has published a [comparison study](https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html) between Druid's original HLL algorithm and the DataSketches HLL algorithm. Based on the demonstrated advantages of the DataSketches implementation, we are recommending using them in preference to Druid's original HLL-based aggregators.
+However, to ensure backwards compatibility, we will continue to support the classic aggregators.
 
-Although deprecated, we will continue to support the older Cardinality/HyperUnique aggregators for backwards compatibility. 
+Please note that `hyperUnique` aggregators are not mutually compatible with Datasketches HLL or Theta sketches.
 
 ##### Multi-column handling
 
@@ -326,10 +324,11 @@ The fixed buckets histogram can perform well when the distribution of the input
 
 We do not recommend the fixed buckets histogram for general use, as its usefulness is extremely data dependent. However, it is made available for users that have already identified use cases where a fixed buckets histogram is suitable.
 
-#### Approximate Histogram (Deprecated)
+#### Approximate Histogram (deprecated)
 
 <div class="note caution">
 The Approximate Histogram aggregator is deprecated.
+There are a number of other quantile estimation algorithms that offer better performance, accuracy, and memory footprint.
 We recommend using <a href="../development/extensions-core/datasketches-quantiles.html">DataSketches Quantiles</a> instead.
 </div>
 
diff --git a/docs/content/querying/lookups.md b/docs/content/querying/lookups.md
index 68f3287..a072317 100644
--- a/docs/content/querying/lookups.md
+++ b/docs/content/querying/lookups.md
@@ -55,6 +55,17 @@ Other lookup types are available as extensions, including:
 - Globally cached lookups from local files, remote URIs, or JDBC through [lookups-cached-global](../development/extensions-core/lookups-cached-global.html).
 - Globally cached lookups from a Kafka topic through [kafka-extraction-namespace](../development/extensions-core/kafka-extraction-namespace.html).
 
+Query Syntax
+------------
+
+In [Druid SQL](sql.html), lookups can be queried using the `LOOKUP` function, for example:
+
+```
+SELECT LOOKUP(column_name, 'lookup-name'), COUNT(*) FROM datasource GROUP BY 1
+```
+
+In native queries, lookups can be queried with [dimension specs or extraction functions](dimensionspecs.html).
+
 Query Execution
 ---------------
 When executing an aggregation query involving lookups, Druid can decide to apply lookups either while scanning and
diff --git a/docs/content/querying/querying.md b/docs/content/querying/querying.md
index 3470e24..5b6e30e 100644
--- a/docs/content/querying/querying.md
+++ b/docs/content/querying/querying.md
@@ -1,6 +1,6 @@
 ---
 layout: doc_page
-title: "Querying"
+title: "Native queries"
 ---
 
 <!--
@@ -22,26 +22,28 @@ title: "Querying"
   ~ under the License.
   -->
 
-# Querying
+# Native queries
 
-Apache Druid (incubating) queries are made using an HTTP REST style request to queryable processes ([Broker](../design/broker.html),
-[Historical](../design/historical.html). [Peons](../design/peons.html)) that are running stream ingestion tasks can also accept queries. The
-query is expressed in JSON and each of these process types expose the same
-REST query interface. For normal Druid operations, queries should be issued to the Broker processes. Queries can be posted
-to the queryable processes like this -
+<div class="note info">
+Apache Druid (incubating) supports two query languages: [Druid SQL](sql.html) and native queries, which SQL queries
+are planned into, and which end users can also issue directly. This document describes the native query language.
+</div>
 
- ```bash
- curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d @<query_json_file>
- ```
+Native queries in Druid are JSON objects and are typically issued to the Broker or Router processes. Queries can be
+posted like this:
+
+```bash
+curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d @<query_json_file>
+```
  
 Druid's native query language is JSON over HTTP, although many members of the community have contributed different 
 [client libraries](../development/libraries.html) in other languages to query Druid. 
 
 The Content-Type/Accept Headers can also take 'application/x-jackson-smile'.
 
- ```bash
- curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/x-jackson-smile' -d @<query_json_file>
- ```
+```bash
+curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/x-jackson-smile' -d @<query_json_file>
+```
 
 Note: If Accept header is not provided, it defaults to value of 'Content-Type' header.
 
@@ -49,6 +51,11 @@ Druid's native query is relatively low level, mapping closely to how computation
 are designed to be lightweight and complete very quickly. This means that for more complex analysis, or to build 
 more complex visualizations, multiple Druid queries may be required.
 
+Even though queries are typically made to Brokers or Routers, they can also be accepted by
+[Historical](../design/historical.html) processes and by [Peons (task JVMs)](../design/peons.html)) that are running
+stream ingestion tasks. This may be valuable if you want to query results for specific segments that are served by
+specific processes.
+
 ## Available Queries
 
 Druid has numerous query types for various use cases. Queries are composed of various JSON properties and Druid has different types of queries for different use cases. The documentation for the various query types describe all the JSON properties that can be set.
diff --git a/docs/content/querying/select-query.md b/docs/content/querying/select-query.md
index e0b7f2e..4c7ba20 100644
--- a/docs/content/querying/select-query.md
+++ b/docs/content/querying/select-query.md
@@ -24,7 +24,15 @@ title: "Select Queries"
 
 # Select Queries
 
-Select queries return raw Apache Druid (incubating) rows and support pagination.
+<div class="note caution">
+We encourage you to use the [Scan query](../querying/scan-query.html) type rather than Select whenever possible.
+In situations involving larger numbers of segments, the Select query can have very high memory and performance overhead.
+The Scan query does not have this issue.
+The major difference between the two is that the Scan query does not support pagination.
+However, the Scan query type is able to return a virtually unlimited number of results even without pagination, making it unnecessary in many cases.
+</div>
+
+Select queries return raw Druid rows and support pagination.
 
 ```json
  {
@@ -41,12 +49,6 @@ Select queries return raw Apache Druid (incubating) rows and support pagination.
  }
 ```
 
-<div class="note info">
-Consider using the [Scan query](../querying/scan-query.html) instead of the Select query if you don't need pagination. 
-The Scan query returns results without pagination but is significantly more efficient in terms of both processing time
-and memory requirements. It is also capable of returning a virtually unlimited number of results.
-</div>
-
 There are several main parts to a select query:
 
 |property|description|required?|
diff --git a/docs/content/querying/sql.md b/docs/content/querying/sql.md
index 4871594..032b101 100644
--- a/docs/content/querying/sql.md
+++ b/docs/content/querying/sql.md
@@ -31,12 +31,12 @@ title: "SQL"
 
 # SQL
 
-<div class="note caution">
-Built-in SQL is an <a href="../development/experimental.html">experimental</a> feature. The API described here is
-subject to change.
+<div class="note info">
+Apache Druid (incubating) supports two query languages: Druid SQL and [native queries](querying.html), which SQL queries
+are planned into, and which end users can also issue directly. This document describes the SQL language.
 </div>
 
-Apache Druid (incubating) SQL is a built-in SQL layer and an alternative to Druid's native JSON-based query language, and is powered by a
+Druid SQL is a built-in SQL layer and an alternative to Druid's native JSON-based query language, and is powered by a
 parser and planner based on [Apache Calcite](https://calcite.apache.org/). Druid SQL translates SQL into native Druid
 queries on the query Broker (the first process you query), which are then passed down to data processes as native Druid
 queries. Other than the (slight) overhead of translating SQL on the Broker, there isn't an additional performance
@@ -125,7 +125,7 @@ Only the COUNT aggregation can accept DISTINCT.
 |`MIN(expr)`|Takes the minimum of numbers.|
 |`MAX(expr)`|Takes the maximum of numbers.|
 |`AVG(expr)`|Averages numbers.|
-|`APPROX_COUNT_DISTINCT(expr)`|Counts distinct values of expr, which can be a regular column or a hyperUnique column. This is always approximate, regardless of the value of "useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`.|
+|`APPROX_COUNT_DISTINCT(expr)`|Counts distinct values of expr, which can be a regular column or a hyperUnique column. This is always approximate, regardless of the value of "useApproximateCountDistinct". This uses Druid's builtin "cardinality" or "hyperUnique" aggregators. See also `COUNT(DISTINCT expr)`.|
 |`APPROX_COUNT_DISTINCT_DS_HLL(expr, [lgK, tgtHllType])`|Counts distinct values of expr, which can be a regular column or an [HLL sketch](../development/extensions-core/datasketches-hll.html) column. The `lgK` and `tgtHllType` parameters are described in the HLL sketch documentation. This is always approximate, regardless of the value of "useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The [DataSketches extension](../development/extensions-core/datasketches-extension.html) [...]
 |`APPROX_COUNT_DISTINCT_DS_THETA(expr, [size])`|Counts distinct values of expr, which can be a regular column or a [Theta sketch](../development/extensions-core/datasketches-theta.html) column. The `size` parameter is described in the Theta sketch documentation. This is always approximate, regardless of the value of "useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The [DataSketches extension](../development/extensions-core/datasketches-extension.html) must be loaded to use [...]
 |`APPROX_QUANTILE(expr, probability, [resolution])`|Computes approximate quantiles on numeric or [approxHistogram](../development/extensions-core/approximate-histograms.html#approximate-histogram-aggregator) exprs. The "probability" should be between 0 and 1 (exclusive). The "resolution" is the number of centroids to use for the computation. Higher resolutions will give more precise results but also have higher overhead. If not provided, the default resolution is 50. The [approximate his [...]
@@ -133,6 +133,8 @@ Only the COUNT aggregation can accept DISTINCT.
 |`APPROX_QUANTILE_FIXED_BUCKETS(expr, probability, numBuckets, lowerLimit, upperLimit, [outlierHandlingMode])`|Computes approximate quantiles on numeric or [fixed buckets histogram](../development/extensions-core/approximate-histograms.html#fixed-buckets-histogram) exprs. The "probability" should be between 0 and 1 (exclusive). The `numBuckets`, `lowerLimit`, `upperLimit`, and `outlierHandlingMode` parameters are described in the fixed buckets histogram documentation. The [approximate hi [...]
 |`BLOOM_FILTER(expr, numEntries)`|Computes a bloom filter from values produced by `expr`, with `numEntries` maximum number of distinct values before false positve rate increases. See [bloom filter extension](../development/extensions-core/bloom-filter.html) documentation for additional details.|
 
+For advice on choosing approximate aggregation functions, check out our [approximate aggregations documentation](aggregations.html#approx).
+
 ### Numeric functions
 
 Numeric functions will return 64 bit integers or 64 bit floats, depending on their inputs.
diff --git a/docs/content/toc.md b/docs/content/toc.md
index 0acf7d0..76ef297 100644
--- a/docs/content/toc.md
+++ b/docs/content/toc.md
@@ -70,32 +70,34 @@ layout: toc
   * [Misc. Tasks](/docs/VERSION/ingestion/misc-tasks.html)
 
 ## Querying
-  * [Overview](/docs/VERSION/querying/querying.html)
-  * [Timeseries](/docs/VERSION/querying/timeseriesquery.html)
-  * [TopN](/docs/VERSION/querying/topnquery.html)
-  * [GroupBy](/docs/VERSION/querying/groupbyquery.html)
-  * [Time Boundary](/docs/VERSION/querying/timeboundaryquery.html)
-  * [Segment Metadata](/docs/VERSION/querying/segmentmetadataquery.html)
-  * [DataSource Metadata](/docs/VERSION/querying/datasourcemetadataquery.html)
-  * [Search](/docs/VERSION/querying/searchquery.html)
-  * [Select](/docs/VERSION/querying/select-query.html)
-  * [Scan](/docs/VERSION/querying/scan-query.html)
-  * Components
-    * [Datasources](/docs/VERSION/querying/datasource.html)
-    * [Filters](/docs/VERSION/querying/filters.html)
-    * [Aggregations](/docs/VERSION/querying/aggregations.html)
-    * [Post Aggregations](/docs/VERSION/querying/post-aggregations.html)
-    * [Granularities](/docs/VERSION/querying/granularities.html)
-    * [DimensionSpecs](/docs/VERSION/querying/dimensionspecs.html)
-    * [Context](/docs/VERSION/querying/query-context.html)
-  * [Multi-value dimensions](/docs/VERSION/querying/multi-value-dimensions.html)
-  * [SQL](/docs/VERSION/querying/sql.html)
-  * [Lookups](/docs/VERSION/querying/lookups.html)
-  * [Joins](/docs/VERSION/querying/joins.html)
-  * [Multitenancy](/docs/VERSION/querying/multitenancy.html)
-  * [Caching](/docs/VERSION/querying/caching.html)
-  * [Sorting Orders](/docs/VERSION/querying/sorting-orders.html)
-  * [Virtual Columns](/docs/VERSION/querying/virtual-columns.html)
+  * [Druid SQL](/docs/VERSION/querying/sql.html)
+  * [Native queries](/docs/VERSION/querying/querying.html)
+    * [Timeseries](/docs/VERSION/querying/timeseriesquery.html)
+    * [TopN](/docs/VERSION/querying/topnquery.html)
+    * [GroupBy](/docs/VERSION/querying/groupbyquery.html)
+    * [Time Boundary](/docs/VERSION/querying/timeboundaryquery.html)
+    * [Segment Metadata](/docs/VERSION/querying/segmentmetadataquery.html)
+    * [DataSource Metadata](/docs/VERSION/querying/datasourcemetadataquery.html)
+    * [Search](/docs/VERSION/querying/searchquery.html)
+    * [Scan](/docs/VERSION/querying/scan-query.html)
+    * [Select](/docs/VERSION/querying/select-query.html)
+    * Components
+      * [Datasources](/docs/VERSION/querying/datasource.html)
+      * [Filters](/docs/VERSION/querying/filters.html)
+      * [Aggregations](/docs/VERSION/querying/aggregations.html)
+      * [Post Aggregations](/docs/VERSION/querying/post-aggregations.html)
+      * [Granularities](/docs/VERSION/querying/granularities.html)
+      * [DimensionSpecs](/docs/VERSION/querying/dimensionspecs.html)
+      * [Sorting Orders](/docs/VERSION/querying/sorting-orders.html)
+      * [Virtual Columns](/docs/VERSION/querying/virtual-columns.html)
+      * [Context](/docs/VERSION/querying/query-context.html)
+  * Concepts
+    * [Multi-value dimensions](/docs/VERSION/querying/multi-value-dimensions.html)
+    * [Lookups](/docs/VERSION/querying/lookups.html)
+    * [Joins](/docs/VERSION/querying/joins.html)
+    * [Multitenancy](/docs/VERSION/querying/multitenancy.html)
+    * [Caching](/docs/VERSION/querying/caching.html)
+    * [Geographic Queries](/docs/VERSION/development/geo.html) (experimental)
 
 ## Design
   * [Overview](/docs/VERSION/design/index.html)
@@ -108,7 +110,7 @@ layout: toc
     * [Historical](/docs/VERSION/design/historical.html)
     * [MiddleManager](/docs/VERSION/design/middlemanager.html)
       * [Peons](/docs/VERSION/design/peons.html)
-    * [Realtime (Deprecated)](/docs/VERSION/design/realtime.html)
+    * [Router](/docs/VERSION/development/router.html) (optional; experimental)
   * Dependencies
     * [Deep Storage](/docs/VERSION/dependencies/deep-storage.html)
     * [Metadata Storage](/docs/VERSION/dependencies/metadata-storage.html)
@@ -161,13 +163,7 @@ layout: toc
   * [Build From Source](/docs/VERSION/development/build.html)
   * [Versioning](/docs/VERSION/development/versioning.html)
   * [Integration](/docs/VERSION/development/integrating-druid-with-other-technologies.html)
-  * Experimental Features
-    * [Overview](/docs/VERSION/development/experimental.html)
-    * [Approximate Histograms and Quantiles](/docs/VERSION/development/extensions-core/approximate-histograms.html)
-    * [Datasketches](/docs/VERSION/development/extensions-core/datasketches-extension.html)
-    * [Geographic Queries](/docs/VERSION/development/geo.html)
-    * [Router](/docs/VERSION/development/router.html)
-    * [Kafka Indexing Service](/docs/VERSION/development/extensions-core/kafka-ingestion.html)
+  * [Experimental Features](/docs/VERSION/development/experimental.html)
 
 ## Misc
   * [Druid Expressions Language](/docs/VERSION/misc/math-expr.html)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org