You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/08/04 21:49:59 UTC

[GitHub] [druid] paul-rogers commented on a change in pull request #11549: Docs: Clarify segmentMetadata cardinality, minmax, and size behavior.

paul-rogers commented on a change in pull request #11549:
URL: https://github.com/apache/druid/pull/11549#discussion_r682983402



##########
File path: docs/querying/segmentmetadataquery.md
##########
@@ -144,16 +144,16 @@ Types of column analyses are described below:
 
 ### cardinality
 
-* `cardinality` in the result will return the size of the bitmap index or dictionary encoding for string dimensions, or null for other dimension types.
- If `merge` was set, the result will be the max of this value across segments. Only relevant for dimension columns.
+* `cardinality` in the result will return the number of unique values present in a string column. It is null for other column types.
+ If `merge` is set, the result will be the max of this value across segments. Only relevant for string columns.
 
 ### minmax
 
-* Estimated min/max values for each column. Only relevant for dimension columns.
+* Estimated min/max values for each column. Only reported for string columns.
 
 ### size
 
-* `size` in the result will contain the estimated total segment byte size as if the data were stored in text format
+* `size` in the result will contain the estimated total segment byte size as if the data were stored in text format. This is _not_ the actual storage size of the column in Druid.

Review comment:
       Pointer to where I might find the actual storage size? I want to know the amount of space the column takes so I know if it is worth the cost to store.

##########
File path: docs/querying/segmentmetadataquery.md
##########
@@ -144,16 +144,16 @@ Types of column analyses are described below:
 
 ### cardinality
 
-* `cardinality` in the result will return the size of the bitmap index or dictionary encoding for string dimensions, or null for other dimension types.
- If `merge` was set, the result will be the max of this value across segments. Only relevant for dimension columns.
+* `cardinality` in the result will return the number of unique values present in a string column. It is null for other column types.
+ If `merge` is set, the result will be the max of this value across segments. Only relevant for string columns.

Review comment:
       This is not clear to us newbies. Does "max" mean the largest number of any segment, or the aggregated total across segments? Both are useful: if I have 1M rows, and see a cardinality of 1K, that could mean 1K total, or 1K per segment, which says something else if each segment has 1K rows...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org