You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/15 22:54:46 UTC

[GitHub] [druid] JulianJaffePinterest opened a new issue #9707: Add IS_MULTIVALUE and COMPLEX_TYPE columns to INFORMATION_SCHEMA.COLUMNS

JulianJaffePinterest opened a new issue #9707: Add IS_MULTIVALUE and COMPLEX_TYPE columns to INFORMATION_SCHEMA.COLUMNS
URL: https://github.com/apache/druid/issues/9707
 
 
   ### Description
   
   Currently, the `INFORMATION_SCHEMA.COLUMNS` table in Druid does not have full information about the columns in a data source, specifically whether or not a column can contain multi-valued entries and detailed information about which complex serde a metric uses. This means that users attempting to interact with Druid metadata must either fall back to (potentially expensive) SegmentMetadataQueries to determine this information or implement complicated handling logic.
   
   ### Motivation
   
   It is currently difficult to obtain complete metadata about a Druid datasource. The only way to determine whether or not a column can contain multiple values or what complex serde was used to encode a metric is to use SegmentMetadataQueries. These queries can be expensive (if aggregating over a large number of segments) and can miss information if not all segments are queried (e.g. columns that don't appear in a given segment won't be returned by a SegmentMetadataQuery, and columns that don't have multiple values _in a queried segment_ will return `..."hasMultipleValues" : false,...`). This can be worked around to some degree via merging results, but this reintroduces the problem that SegmentMetadataQueries can be expensive when run over many segments. If the `INFORMATION_SCHEMA.COLUMNS` table were to be extended to include information about whether a column could contain multiple values and what serde was used for a complex metric, tools that interact with Druid metadata such as the [Calcite Druid adapter](https://calcite.apache.org/docs/druid_adapter.html), proposed Spark and Hive readers, and other third party integrations could issue simple SQL-based queries to determine data source metadata instead of needing to rely on SegmentMetadataQueries. This would also align with the Druid recommendation to use the `INFORMATION_SCHEMA` tables for metadata if you're using SQL.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org