You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by da...@apache.org on 2018/08/09 20:42:56 UTC
[incubator-druid] branch master updated: Add docs for virtual
columns and transform specs (#6119)
This is an automated email from the ASF dual-hosted git repository.
davidlim pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-druid.git
The following commit(s) were added to refs/heads/master by this push:
new aa660b8 Add docs for virtual columns and transform specs (#6119)
aa660b8 is described below
commit aa660b87515cc66b40899b46a3a83a37459f9e3b
Author: Jonathan Wei <jo...@users.noreply.github.com>
AuthorDate: Thu Aug 9 13:42:52 2018 -0700
Add docs for virtual columns and transform specs (#6119)
* Add docs for virtual columns and transform specs
* PR Comments
* PR comment
---
docs/content/ingestion/index.md | 11 ++++-
docs/content/ingestion/transform-spec.md | 84 ++++++++++++++++++++++++++++++++
docs/content/misc/math-expr.md | 6 +++
docs/content/querying/virtual-columns.md | 60 +++++++++++++++++++++++
docs/content/toc.md | 3 ++
5 files changed, 163 insertions(+), 1 deletion(-)
diff --git a/docs/content/ingestion/index.md b/docs/content/ingestion/index.md
index 1529455..02fa1cc 100644
--- a/docs/content/ingestion/index.md
+++ b/docs/content/ingestion/index.md
@@ -87,7 +87,8 @@ An example dataSchema is shown below:
"segmentGranularity" : "DAY",
"queryGranularity" : "NONE",
"intervals" : [ "2013-08-31/2013-09-01" ]
- }
+ },
+ "transformSpec" : null
}
```
@@ -97,6 +98,7 @@ An example dataSchema is shown below:
| parser | JSON Object | Specifies how ingested data can be parsed. | yes |
| metricsSpec | JSON Object array | A list of [aggregators](../querying/aggregations.html). | yes |
| granularitySpec | JSON Object | Specifies how to create segments and roll up data. | yes |
+| transformSpec | JSON Object | Specifes how to filter and transform input data. See [transform specs](../ingestion/transform-spec.html).| no |
## Parser
@@ -244,6 +246,9 @@ for the `comment` column.
}
```
+## metricsSpec
+ The `metricsSpec` is a list of [aggregators](../querying/aggregations.html). If `rollup` is false in the granularity spec, the metrics spec should be an empty list and all columns should be defined in the `dimensionsSpec` instead (without rollup, there isn't a real distinction between dimensions and metrics at ingestion time). This is optional, however.
+
## GranularitySpec
The default granularity spec is `uniform`, and can be changed by setting the `type` field.
@@ -270,6 +275,10 @@ This spec is used to generate segments with arbitrary intervals (it tries to cre
| rollup | boolean | rollup or not | no (default == true) |
| intervals | string | A list of intervals for the raw data being ingested. Ignored for real-time ingestion. | no. If specified, batch ingestion tasks may skip determining partitions phase which results in faster ingestion. |
+# Transform Spec
+
+Transform specs allow Druid to transform and filter input data during ingestion. See [Transform specs](../ingestion/transform-spec.html)
+
# IO Config
Stream Push Ingestion: Stream push ingestion with Tranquility does not require an IO Config.
diff --git a/docs/content/ingestion/transform-spec.md b/docs/content/ingestion/transform-spec.md
new file mode 100644
index 0000000..eedaaa6
--- /dev/null
+++ b/docs/content/ingestion/transform-spec.md
@@ -0,0 +1,84 @@
+---
+layout: doc_page
+---
+
+# Transform Specs
+
+Transform specs allow Druid to filter and transform input data during ingestion.
+
+## Syntax
+
+The syntax for the transformSpec is shown below:
+
+```
+"transformSpec": {
+ "transforms: <List of transforms>,
+ "filter": <filter>
+}
+```
+
+|property|description|required?|
+|--------|-----------|---------|
+|transforms|A list of [transforms](#transforms) to be applied to input rows. |no|
+|filter|A [filter](../querying/filters.html) that will be applied to input rows; only rows that pass the filter will be ingested.|no|
+
+## Transforms
+
+The `transforms` list allows the user to specify a set of column transformations to be performed on input data.
+
+Transforms allow adding new fields to input rows. Each transform has a "name" (the name of the new field) which can be referred to by DimensionSpecs, AggregatorFactories, etc.
+
+A transform behaves as a "row function", taking an entire row as input and outputting a column value.
+
+If a transform has the same name as a field in an input row, then it will shadow the original field. Transforms that shadow fields may still refer to the fields they shadow. This can be used to transform a field "in-place".
+
+Transforms do have some limitations. They can only refer to fields present in the actual input rows; in particular, they cannot refer to other transforms. And they cannot remove fields, only add them. However, they can shadow a field with another field containing all nulls, which will act similarly to removing the field.
+
+Note that the transforms are applied before the filter.
+
+### Expression Transform
+
+Druid currently supports one kind of transform, the expression transform.
+
+An expression transform has the following syntax:
+
+```
+{
+ "type": "expression",
+ "name": <output field name>,
+ "expression": <expr>
+}
+```
+
+|property|description|required?|
+|--------|-----------|---------|
+|name|The output field name of the expression transform.|yes|
+|expression|An [expression](../misc/math-expr.html) that will be applied to input rows to produce a value for the transform's output field.|no|
+
+For example, the following expression transform prepends "foo" to the values of a `page` column in the input data, and creates a `fooPage` column.
+
+```
+ {
+ "type": "expression",
+ "name": "fooPage",
+ "expression": "concat('foo' + page)"
+ }
+```
+
+## Filtering
+
+The transformSpec allows Druid to filter out input rows during ingestion. A row that fails to pass the filter will not be ingested.
+
+Any of Druid's standard [filters](../querying/filters.html) can be used.
+
+Note that the filtering takes place after the transforms, so filters will operate on transformed rows and not the raw input data if transforms are present.
+
+For example, the following filter would ingest only input rows where a `country` column has the value "United States":
+
+```
+"filter": {
+ "type": "selector",
+ "dimension": "country",
+ "value": "United States"
+}
+```
\ No newline at end of file
diff --git a/docs/content/misc/math-expr.md b/docs/content/misc/math-expr.md
index abcebdd..d821491 100644
--- a/docs/content/misc/math-expr.md
+++ b/docs/content/misc/math-expr.md
@@ -2,6 +2,12 @@
layout: doc_page
---
+# Druid Expressions
+
+<div class="note info">
+This feature is still experimental. It has not been optimized for performance yet, and its implementation is known to have significant inefficiencies.
+</div>
+
This expression language supports the following operators (listed in decreasing order of precedence).
|Operators|Description|
diff --git a/docs/content/querying/virtual-columns.md b/docs/content/querying/virtual-columns.md
new file mode 100644
index 0000000..117b75e
--- /dev/null
+++ b/docs/content/querying/virtual-columns.md
@@ -0,0 +1,60 @@
+---
+layout: doc_page
+---
+
+# Virtual Columns
+
+Virtual columns are queryable column "views" created from a set of columns during a query.
+
+A virtual column can potentially draw from multiple underlying columns, although a virtual column always presents itself as a single column.
+
+Virtual columns can be used as dimensions or as inputs to aggregators.
+
+Each Druid query can accept a list of virtual columns as a parameter. The following scan query is provided as an example:
+
+```
+{
+ "queryType": "scan",
+ "dataSource": "page_data",
+ "columns":[],
+ "virtualColumns": [
+ {
+ "type": "expression",
+ "name": "fooPage",
+ "expression": "concat('foo' + page)",
+ "outputType": "STRING"
+ },
+ {
+ "type": "expression",
+ "name": "tripleWordCount",
+ "expression": "wordCount * 3",
+ "outputType": "LONG"
+ }
+ ],
+ "intervals": [
+ "2013-01-01/2019-01-02"
+ ]
+}
+```
+
+
+## Virtual Column Types
+
+### Expression virtual column
+
+The expression virtual column has the following syntax:
+
+```
+{
+ "type": "expression",
+ "name": <name of the virtual column>,
+ "expression": <row expression>,
+ "outputType": <output value type of expression>
+}
+```
+
+|property|description|required?|
+|--------|-----------|---------|
+|name|The name of the virtual column.|yes|
+|expression|An [expression](../misc/math-expr.html) that takes a row as input and outputs a value for the virtual column.|yes|
+|outputType|The expression's output will be coerced to this type. Can be LONG, FLOAT, DOUBLE, or STRING.|no, default is FLOAT|
\ No newline at end of file
diff --git a/docs/content/toc.md b/docs/content/toc.md
index 6553e96..a8cd7ed 100644
--- a/docs/content/toc.md
+++ b/docs/content/toc.md
@@ -32,6 +32,7 @@ layout: toc
* [Stream Pull](/docs/VERSION/ingestion/stream-pull.html)
* [Updating Existing Data](/docs/VERSION/ingestion/update-existing-data.html)
* [Ingestion Tasks](/docs/VERSION/ingestion/tasks.html)
+ * [Transform Specs](/docs/VERSION/ingestion/transform-spec.html)
* [FAQ](/docs/VERSION/ingestion/faq.html)
## Querying
@@ -60,6 +61,7 @@ layout: toc
* [Multitenancy](/docs/VERSION/querying/multitenancy.html)
* [Caching](/docs/VERSION/querying/caching.html)
* [Sorting Orders](/docs/VERSION/querying/sorting-orders.html)
+ * [Virtual Columns](/docs/VERSION/querying/virtual-columns.html)
## Design
* [Overview](/docs/VERSION/design/design.html)
@@ -127,5 +129,6 @@ layout: toc
## Misc
+ * [Druid Expressions Language](/docs/VERSION/misc/math-expr.html)
* [Papers & Talks](/docs/VERSION/misc/papers-and-talks.html)
* [Thanks](/thanks.html)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org