You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Angappan Ganesh (JIRA)" <ji...@apache.org> on 2016/01/08 10:45:39 UTC

[jira] [Commented] (DRILL-3524) Drill proper DESCRIBE support for MongoDB

    [ https://issues.apache.org/jira/browse/DRILL-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088976#comment-15088976 ] 

Angappan Ganesh commented on DRILL-3524:
----------------------------------------

This would be a good feature to implement, so that it would be easier to integrate Drill with existing BI systems like Oracle BI, Tableau etc. BI systems traditionally work with RDBMS - they would need metadata upfront so that the user can drag and drop columns and run analytics on the selected columns. There are some drivers in the market that does this already - they scan the entire mongodb collection and creates a schema. If this functionality is present in Drill, it would be a blind choice for integrating with existing BI systems.

> Drill proper DESCRIBE support for MongoDB
> -----------------------------------------
>
>                 Key: DRILL-3524
>                 URL: https://issues.apache.org/jira/browse/DRILL-3524
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata, Storage - MongoDB
>    Affects Versions: 1.1.0
>            Reporter: Hari Sekhon
>             Fix For: Future
>
>
> Request to add full DESCRIBE support for MongoDB collections.
> I understand this may be difficult / sub-optimal due to the flexible schema nature of Mongo docs but if you can tabulate results when reading directly from MongoDB for which you have read the field names, then it's also possible to extract all field names to present for the describe command, albeit an inefficient scan to do so.
> Currently describe returns a pseudo / inaccurate / unhelpful metadata:
> {code}+--------------+------------+--------------+
> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> +--------------+------------+--------------+
> | *            | ANY        | YES          |
> +--------------+------------+--------------+{code}
> Perhaps you could extend DESCRIBE to scan the first few dozen docs by default to create a merged schema as well as adding an optional argument to the describe command to allow for scanning a user-specified number of docs from which to describe the schema, or an ALL argument keyword to describe to scan all docs in a collection to get the complete global schema for the collection?
> In case of schema evolution it might be an interesting option to additionally read the newest and oldest records, maybe the first and last records by ID etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)