You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Vitalii Diravka (JIRA)" <ji...@apache.org> on 2018/09/12 16:31:00 UTC
[jira] [Commented] (DRILL-6552) Drill Metadata management "Drill MetaStore"

    [ https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612437#comment-16612437 ] 

Vitalii Diravka commented on DRILL-6552:
----------------------------------------

[~weijie] [~paul-rogers]
We had a discussion in Hagouts and presented some slides there: https://docs.google.com/presentation/d/1m8Hxnwv3PtgIDfNsptCWwA_UYpTq_t3yyvj4P-7NuFs/edit#slide=id.p
Currently, we are working on design doc for Drill Metastore. 
The general ideas are: 
* define Drill Metastore API (it will allow adding new implementations in future, like HBase/MapR-DB etc)
* accommodate current Parquet Metadata cache files to it (also it will allow easier create the similar metadata cache files for other storage formats)
* add implementation for usage of HMS in Drill Metastore
* implement collecting metadata for different storages by leveraging custom operators from DRILL-1328 Gautam's work
* implement custom JSON schema reader for exploring the JSON schema.

Possible solutions for some HMS limitations: 
* HMS can store only table and partitions metadata, so parquet, json, csv... files schema/statistics will be stored as partitions metadata. 
* HMS can store only some specific kinds of column statistics, so some other ways are considered how to store all available Drill statistics, for instance to store this data as table/partition properties or contribute to Hive Metastore to expand it.


> Drill Metadata management "Drill MetaStore"
> -------------------------------------------
>
>                 Key: DRILL-6552
>                 URL: https://issues.apache.org/jira/browse/DRILL-6552
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Metadata
>    Affects Versions: 1.13.0
>            Reporter: Vitalii Diravka
>            Assignee: Vitalii Diravka
>            Priority: Major
>             Fix For: 2.0.0
>
>
> It would be useful for Drill to have some sort of metastore which would enable Drill to remember previously defined schemata so Drill doesn’t have to do the same work over and over again.
> It allows to store schema and statistics, which will allow to accelerate queries validation, planning and execution time. Also it increases stability of Drill and allows to avoid different kind if issues: "schema change Exceptions", "limit 0" optimization and so on. 
> One of the main candidates is Hive Metastore.
> Starting from 3.0 version Hive Metastore can be the separate service from Hive server:
> [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration]
> Optional enhancement is storing Drill's profiles, UDFs, plugins configs in some kind of metastore as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)