You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "vtlim (via GitHub)" <gi...@apache.org> on 2023/05/17 20:59:28 UTC

[GitHub] [druid] vtlim commented on a diff in pull request #14161: docs: remove the note about segments

vtlim commented on code in PR #14161:
URL: https://github.com/apache/druid/pull/14161#discussion_r1197044178


##########
docs/querying/sql-data-types.md:
##########
@@ -27,7 +27,7 @@ sidebar_label: "SQL data types"
 > This document describes the SQL language.
 
 
-Columns in Druid are associated with a specific data type. This topic describes supported data types in [Druid SQL](./sql.md). 
+Druid associates columns with a specific data type. This topic describes supported data types in [Druid SQL](./sql.md).

Review Comment:
   We should be consistent in singular/plural (I realize this wasn't your change). Maybe something like "Druid associates each column with..."



##########
docs/querying/sql-data-types.md:
##########
@@ -67,53 +66,51 @@ The following table describes how Druid maps SQL types onto native types when ru
 |SMALLINT|LONG|`0`||
 |INTEGER|LONG|`0`||
 |BIGINT|LONG|`0`|Druid LONG columns (except `__time`) are reported as BIGINT|
-|TIMESTAMP|LONG|`0`, meaning 1970-01-01 00:00:00 UTC|Druid's `__time` column is reported as TIMESTAMP. Casts between string and timestamp types assume standard SQL formatting, e.g. `2000-01-02 03:04:05`, _not_ ISO8601 formatting. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
-|DATE|LONG|`0`, meaning 1970-01-01|Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting, e.g. `2000-01-02`. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
-|ARRAY|ARRAY|`NULL`|Druid native array types work as SQL arrays, and multi-value strings can be converted to arrays. See the [`ARRAY` details](#arrays).|
+|TIMESTAMP|LONG|`0`, meaning 1970-01-01 00:00:00 UTC|Druid's `__time` column is reported as TIMESTAMP. Casts between string and timestamp types assume standard SQL formatting&mdash;for example, `2000-01-02 03:04:05`&mdash;not ISO8601 formatting. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
+|DATE|LONG|`0`, meaning 1970-01-01|Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting&mdash;for example, `2000-01-02`. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
+|ARRAY|ARRAY|`NULL`|Druid native array types work as SQL arrays, and multi-value strings can be converted to arrays. See [Arrays](#arrays) for more information.|
 |OTHER|COMPLEX|none|May represent various Druid column types such as hyperUnique, approxHistogram, etc.|
 
 <sup>*</sup> Default value applies if `druid.generic.useDefaultValueForNull = true` (the default mode). Otherwise, the default value is `NULL` for all types.
 
 ## Multi-value strings
 
 Druid's native type system allows strings to potentially have multiple values. These
-[multi-value string dimensions](multi-value-dimensions.md) are reported in SQL as `VARCHAR` typed, and can be
-syntactically used like any other `VARCHAR`. Regular string functions that refer to multi-value string dimensions are
+[multi-value string dimensions](multi-value-dimensions.md) are reported in SQL as VARCHAR typed, and can be
+syntactically used like any other VARCHAR. Regular string functions that refer to multi-value string dimensions are
 applied to all values for each row individually. Multi-value string dimensions can also be treated as arrays via special
 [multi-value string functions](sql-multivalue-string-functions.md), which can perform powerful array-aware operations, but retain
-their `VARCHAR` typing and behavior.
+their VARCHAR typing and behavior.
 
 Grouping by a multi-value expression observes the native Druid multi-value aggregation behavior, which is similar to
-an implicit SQL `UNNEST`. Refer to the documentation on [multi-value string dimensions](multi-value-dimensions.md)
+the implicit SQL UNNEST operator. Refer to the documentation on [multi-value string dimensions](multi-value-dimensions.md)

Review Comment:
   I think this changes the meaning slightly? UNNEST isn't implicit in itself, if I understand correctly



##########
docs/querying/sql-data-types.md:
##########
@@ -67,53 +66,51 @@ The following table describes how Druid maps SQL types onto native types when ru
 |SMALLINT|LONG|`0`||
 |INTEGER|LONG|`0`||
 |BIGINT|LONG|`0`|Druid LONG columns (except `__time`) are reported as BIGINT|
-|TIMESTAMP|LONG|`0`, meaning 1970-01-01 00:00:00 UTC|Druid's `__time` column is reported as TIMESTAMP. Casts between string and timestamp types assume standard SQL formatting, e.g. `2000-01-02 03:04:05`, _not_ ISO8601 formatting. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
-|DATE|LONG|`0`, meaning 1970-01-01|Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting, e.g. `2000-01-02`. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
-|ARRAY|ARRAY|`NULL`|Druid native array types work as SQL arrays, and multi-value strings can be converted to arrays. See the [`ARRAY` details](#arrays).|
+|TIMESTAMP|LONG|`0`, meaning 1970-01-01 00:00:00 UTC|Druid's `__time` column is reported as TIMESTAMP. Casts between string and timestamp types assume standard SQL formatting&mdash;for example, `2000-01-02 03:04:05`&mdash;not ISO8601 formatting. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|

Review Comment:
   ```suggestion
   |TIMESTAMP|LONG|`0`, meaning 1970-01-01 00:00:00 UTC|Druid's `__time` column is reported as TIMESTAMP. Casts between string and timestamp types assume standard SQL formatting&mdash;for example, `2000-01-02 03:04:05`&mdash;not ISO 8601 formatting. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
   ```



##########
docs/querying/sql-data-types.md:
##########
@@ -67,53 +66,51 @@ The following table describes how Druid maps SQL types onto native types when ru
 |SMALLINT|LONG|`0`||
 |INTEGER|LONG|`0`||
 |BIGINT|LONG|`0`|Druid LONG columns (except `__time`) are reported as BIGINT|
-|TIMESTAMP|LONG|`0`, meaning 1970-01-01 00:00:00 UTC|Druid's `__time` column is reported as TIMESTAMP. Casts between string and timestamp types assume standard SQL formatting, e.g. `2000-01-02 03:04:05`, _not_ ISO8601 formatting. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
-|DATE|LONG|`0`, meaning 1970-01-01|Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting, e.g. `2000-01-02`. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
-|ARRAY|ARRAY|`NULL`|Druid native array types work as SQL arrays, and multi-value strings can be converted to arrays. See the [`ARRAY` details](#arrays).|
+|TIMESTAMP|LONG|`0`, meaning 1970-01-01 00:00:00 UTC|Druid's `__time` column is reported as TIMESTAMP. Casts between string and timestamp types assume standard SQL formatting&mdash;for example, `2000-01-02 03:04:05`&mdash;not ISO8601 formatting. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
+|DATE|LONG|`0`, meaning 1970-01-01|Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting&mdash;for example, `2000-01-02`. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).|
+|ARRAY|ARRAY|`NULL`|Druid native array types work as SQL arrays, and multi-value strings can be converted to arrays. See [Arrays](#arrays) for more information.|
 |OTHER|COMPLEX|none|May represent various Druid column types such as hyperUnique, approxHistogram, etc.|
 
 <sup>*</sup> Default value applies if `druid.generic.useDefaultValueForNull = true` (the default mode). Otherwise, the default value is `NULL` for all types.
 
 ## Multi-value strings
 
 Druid's native type system allows strings to potentially have multiple values. These
-[multi-value string dimensions](multi-value-dimensions.md) are reported in SQL as `VARCHAR` typed, and can be
-syntactically used like any other `VARCHAR`. Regular string functions that refer to multi-value string dimensions are
+[multi-value string dimensions](multi-value-dimensions.md) are reported in SQL as VARCHAR typed, and can be
+syntactically used like any other VARCHAR. Regular string functions that refer to multi-value string dimensions are
 applied to all values for each row individually. Multi-value string dimensions can also be treated as arrays via special
 [multi-value string functions](sql-multivalue-string-functions.md), which can perform powerful array-aware operations, but retain
-their `VARCHAR` typing and behavior.
+their VARCHAR typing and behavior.
 
 Grouping by a multi-value expression observes the native Druid multi-value aggregation behavior, which is similar to
-an implicit SQL `UNNEST`. Refer to the documentation on [multi-value string dimensions](multi-value-dimensions.md)
+the implicit SQL UNNEST operator. Refer to the documentation on [multi-value string dimensions](multi-value-dimensions.md)
 for additional details.
 
-> Because multi-value dimensions are treated by the SQL planner as `VARCHAR`, there are some inconsistencies between how
-> they are handled in Druid SQL and in native queries. For example, expressions involving multi-value dimensions may be
-> incorrectly optimized by the Druid SQL planner: `multi_val_dim = 'a' AND multi_val_dim = 'b'` is optimized to
-> `false`, even though it is possible for a single row to have both "a" and "b" as values for `multi_val_dim`. The
-> SQL behavior of multi-value dimensions may change in a future release to more closely align with their behavior
-> in native queries, but the [multi-value string functions](./sql-multivalue-string-functions.md) should be able to provide
-> nearly all possible native functionality.
+> Because the SQL planner treats multi-value dimensions as VARCHAR, there are some inconsistencies between how they are handled in Druid SQL and in native queries. For instance, expressions involving multi-value dimensions may be incorrectly optimized by the Druid SQL planner. For example, `multi_val_dim = 'a' AND multi_val_dim = 'b'` is optimized to
+`false`, even though it is possible for a single row to have both `'a'` and `'b'` as values for `multi_val_dim`.
+>
+> The SQL behavior of multi-value dimensions may change in a future release to more closely align with their behavior in native queries, but the [multi-value string functions](./sql-multivalue-string-functions.md) should be able to provide nearly all possible native functionality.
 
 ## Arrays
-Druid supports `ARRAY` types constructed at query time, though it currently lacks the ability to store them in

Review Comment:
   Should this have been removed?
   >though it currently lacks the ability to store them in segments



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org