You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "clintropolis (via GitHub)" <gi...@apache.org> on 2023/02/02 06:09:36 UTC

[GitHub] [druid] clintropolis commented on a diff in pull request #13736: docs: sql unnest and cleanup unnest datasource

clintropolis commented on code in PR #13736:
URL: https://github.com/apache/druid/pull/13736#discussion_r1094073098


##########
docs/querying/multi-value-dimensions.md:
##########
@@ -147,6 +147,12 @@ only row1, and generate a result with three groups: `t1`, `t2`, and `t3`. If you
 your filter, you can use a [filtered dimensionSpec](dimensionspecs.md#filtered-dimensionspecs). This can also
 improve performance.
 
+## Unnesting
+
+You can unnest a column that contains multi-value dimensions (arrays) by using either the [UNNEST function (SQL)](../querying/sql.md#unnest) and the helper function MV_TO_ARRAY or the [`unnest` datasource (native)](../querying/datasource.md#unnest) .

Review Comment:
   > ... multi-value dimensions (arrays) ...
   
   please do not conflate multi-value dimensions with actual druid array types, they are totally separate internally. Multi-value strings are represented internally as the native `STRING` type and show up as `VARCHAR` in the SQL schema. Druid _also_ has `ARRAY` types, such as `ARRAY<STRING>`, `ARRAY<LONG>`, etc, that are not well documented yet _on purpose_ to not lock ourselves into a specific behavior prematurely.
   
   Also, its very nearly pointless to use `UNNEST` on a multi-value string column, because all multi-value strings have an implicit unnesting that occurs when grouping. The only case for using on a `STRING` is as part of a scan query
   



##########
docs/querying/sql-functions.md:
##########
@@ -1357,6 +1357,13 @@ Truncates a numerical expression to a specific number of decimal digits.
 
 Parses `expr` into a `COMPLEX<json>` object. This operator deserializes JSON values when processing them, translating stringified JSON into a nested structure. If the input is not a `VARCHAR` or it is invalid JSON, this function will result in a `NULL` value.
 
+## UNNEST
+
+`UNNEST(source)) as UNNESTED (target)`
+
+Unnests a source column that includes arrays (multi-value dimensions) into a target column.

Review Comment:
   again please don't conflate multi-value strings with arrays, they are not arrays, they just have special functions that allow interacting with them as if they were actual arrays.



##########
docs/querying/sql.md:
##########
@@ -82,6 +83,27 @@ FROM clause, metadata tables are not considered datasources. They exist only in
 For more information about table, lookup, query, and join datasources, refer to the [Datasources](datasource.md)
 documentation.
 
+## UNNEST
+
+The UNNEST clause unnests values stored in arrays within a column. It's the SQL equivalent to the [unnest datasource](./datasource.md#unnest).

Review Comment:
   unnest isn't limited to working on arrays in a column (we don't currently have any array typed columns). `MV_TO_ARRAY` effectively allows casting a multi-value string to an `ARRAY<STRING>`, but arrays could also come from other virtual columns, such as anything using the `ARRAY` constructor or anything that can build arrays, such as `ARRAY_AGG`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org