You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/08/04 21:39:16 UTC

[GitHub] [druid] gianm opened a new pull request #11549: Docs: Clarify segmentMetadata cardinality, minmax, and size behavior.

gianm opened a new pull request #11549:
URL: https://github.com/apache/druid/pull/11549


   1) Dimensions exist that aren't strings, so say "string" explicitly when that's what we mean.
   2) Emphasize that `size` isn't actually the storage size. (See also: #7124)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] techdocsmith commented on a change in pull request #11549: Docs: Clarify segmentMetadata cardinality, minmax, and size behavior.

Posted by GitBox <gi...@apache.org>.
techdocsmith commented on a change in pull request #11549:
URL: https://github.com/apache/druid/pull/11549#discussion_r694384889



##########
File path: docs/querying/segmentmetadataquery.md
##########
@@ -144,16 +144,27 @@ Types of column analyses are described below:
 
 ### cardinality
 
-* `cardinality` in the result will return the size of the bitmap index or dictionary encoding for string dimensions, or null for other dimension types.
- If `merge` was set, the result will be the max of this value across segments. Only relevant for dimension columns.
+* `cardinality` is the number of unique values present in string columns. It is null for other column types.
+
+This value is computed by examining the size of string column dictionaries. There is one dictionary per column per

Review comment:
       ```suggestion
   Druid examines the size of string column dictionaries to compute the cardinality value. There is one dictionary per column per
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on a change in pull request #11549: Docs: Clarify segmentMetadata cardinality, minmax, and size behavior.

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on a change in pull request #11549:
URL: https://github.com/apache/druid/pull/11549#discussion_r682983402



##########
File path: docs/querying/segmentmetadataquery.md
##########
@@ -144,16 +144,16 @@ Types of column analyses are described below:
 
 ### cardinality
 
-* `cardinality` in the result will return the size of the bitmap index or dictionary encoding for string dimensions, or null for other dimension types.
- If `merge` was set, the result will be the max of this value across segments. Only relevant for dimension columns.
+* `cardinality` in the result will return the number of unique values present in a string column. It is null for other column types.
+ If `merge` is set, the result will be the max of this value across segments. Only relevant for string columns.
 
 ### minmax
 
-* Estimated min/max values for each column. Only relevant for dimension columns.
+* Estimated min/max values for each column. Only reported for string columns.
 
 ### size
 
-* `size` in the result will contain the estimated total segment byte size as if the data were stored in text format
+* `size` in the result will contain the estimated total segment byte size as if the data were stored in text format. This is _not_ the actual storage size of the column in Druid.

Review comment:
       Pointer to where I might find the actual storage size? I want to know the amount of space the column takes so I know if it is worth the cost to store.

##########
File path: docs/querying/segmentmetadataquery.md
##########
@@ -144,16 +144,16 @@ Types of column analyses are described below:
 
 ### cardinality
 
-* `cardinality` in the result will return the size of the bitmap index or dictionary encoding for string dimensions, or null for other dimension types.
- If `merge` was set, the result will be the max of this value across segments. Only relevant for dimension columns.
+* `cardinality` in the result will return the number of unique values present in a string column. It is null for other column types.
+ If `merge` is set, the result will be the max of this value across segments. Only relevant for string columns.

Review comment:
       This is not clear to us newbies. Does "max" mean the largest number of any segment, or the aggregated total across segments? Both are useful: if I have 1M rows, and see a cardinality of 1K, that could mean 1K total, or 1K per segment, which says something else if each segment has 1K rows...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] techdocsmith commented on pull request #11549: Docs: Clarify segmentMetadata cardinality, minmax, and size behavior.

Posted by GitBox <gi...@apache.org>.
techdocsmith commented on pull request #11549:
URL: https://github.com/apache/druid/pull/11549#issuecomment-906790536


   @gianm , @paul-rogers if we need more clarification, let's pick that up in a new pr. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on a change in pull request #11549: Docs: Clarify segmentMetadata cardinality, minmax, and size behavior.

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on a change in pull request #11549:
URL: https://github.com/apache/druid/pull/11549#discussion_r682982767



##########
File path: docs/querying/segmentmetadataquery.md
##########
@@ -144,16 +144,16 @@ Types of column analyses are described below:
 
 ### cardinality
 
-* `cardinality` in the result will return the size of the bitmap index or dictionary encoding for string dimensions, or null for other dimension types.
- If `merge` was set, the result will be the max of this value across segments. Only relevant for dimension columns.
+* `cardinality` in the result will return the number of unique values present in a string column. It is null for other column types.
+ If `merge` is set, the result will be the max of this value across segments. Only relevant for string columns.

Review comment:
       This is not clear to us newbies. Does "max" mean the largest number of any segment, or the aggregated total across segments? Both are useful: if I have 1M rows, and see a cardinality of 1K, that could mean either A) 1K total, or B) 1K per segment. If there are 100K rows per segment, 1K per segment says one thing. If there are 1K rows per segment, then a cardinality of 1K per segment says something else.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] techdocsmith merged pull request #11549: Docs: Clarify segmentMetadata cardinality, minmax, and size behavior.

Posted by GitBox <gi...@apache.org>.
techdocsmith merged pull request #11549:
URL: https://github.com/apache/druid/pull/11549


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on a change in pull request #11549: Docs: Clarify segmentMetadata cardinality, minmax, and size behavior.

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on a change in pull request #11549:
URL: https://github.com/apache/druid/pull/11549#discussion_r682983402



##########
File path: docs/querying/segmentmetadataquery.md
##########
@@ -144,16 +144,16 @@ Types of column analyses are described below:
 
 ### cardinality
 
-* `cardinality` in the result will return the size of the bitmap index or dictionary encoding for string dimensions, or null for other dimension types.
- If `merge` was set, the result will be the max of this value across segments. Only relevant for dimension columns.
+* `cardinality` in the result will return the number of unique values present in a string column. It is null for other column types.
+ If `merge` is set, the result will be the max of this value across segments. Only relevant for string columns.
 
 ### minmax
 
-* Estimated min/max values for each column. Only relevant for dimension columns.
+* Estimated min/max values for each column. Only reported for string columns.
 
 ### size
 
-* `size` in the result will contain the estimated total segment byte size as if the data were stored in text format
+* `size` in the result will contain the estimated total segment byte size as if the data were stored in text format. This is _not_ the actual storage size of the column in Druid.

Review comment:
       Pointer to where I might find the actual storage size? I want to know the amount of space the column takes so I know if it is worth the cost to store.

##########
File path: docs/querying/segmentmetadataquery.md
##########
@@ -144,16 +144,16 @@ Types of column analyses are described below:
 
 ### cardinality
 
-* `cardinality` in the result will return the size of the bitmap index or dictionary encoding for string dimensions, or null for other dimension types.
- If `merge` was set, the result will be the max of this value across segments. Only relevant for dimension columns.
+* `cardinality` in the result will return the number of unique values present in a string column. It is null for other column types.
+ If `merge` is set, the result will be the max of this value across segments. Only relevant for string columns.

Review comment:
       This is not clear to us newbies. Does "max" mean the largest number of any segment, or the aggregated total across segments? Both are useful: if I have 1M rows, and see a cardinality of 1K, that could mean 1K total, or 1K per segment, which says something else if each segment has 1K rows...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #11549: Docs: Clarify segmentMetadata cardinality, minmax, and size behavior.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #11549:
URL: https://github.com/apache/druid/pull/11549#discussion_r686977832



##########
File path: docs/querying/segmentmetadataquery.md
##########
@@ -144,16 +144,16 @@ Types of column analyses are described below:
 
 ### cardinality
 
-* `cardinality` in the result will return the size of the bitmap index or dictionary encoding for string dimensions, or null for other dimension types.
- If `merge` was set, the result will be the max of this value across segments. Only relevant for dimension columns.
+* `cardinality` in the result will return the number of unique values present in a string column. It is null for other column types.
+ If `merge` is set, the result will be the max of this value across segments. Only relevant for string columns.
 
 ### minmax
 
-* Estimated min/max values for each column. Only relevant for dimension columns.
+* Estimated min/max values for each column. Only reported for string columns.
 
 ### size
 
-* `size` in the result will contain the estimated total segment byte size as if the data were stored in text format
+* `size` in the result will contain the estimated total segment byte size as if the data were stored in text format. This is _not_ the actual storage size of the column in Druid.

Review comment:
       I pushed some new content that hopefully clears this up. Please let me know if it makes sense to you.

##########
File path: docs/querying/segmentmetadataquery.md
##########
@@ -144,16 +144,16 @@ Types of column analyses are described below:
 
 ### cardinality
 
-* `cardinality` in the result will return the size of the bitmap index or dictionary encoding for string dimensions, or null for other dimension types.
- If `merge` was set, the result will be the max of this value across segments. Only relevant for dimension columns.
+* `cardinality` in the result will return the number of unique values present in a string column. It is null for other column types.
+ If `merge` is set, the result will be the max of this value across segments. Only relevant for string columns.

Review comment:
       I pushed some new content that hopefully clears this up. Please let me know if it makes sense to you.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org