You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Joseph Barefoot (JIRA)" <ji...@apache.org> on 2015/07/31 19:35:04 UTC

[jira] [Created] (DRILL-3588) Write back to Hive Metastore

Joseph Barefoot created DRILL-3588:
--------------------------------------

             Summary: Write back to Hive Metastore
                 Key: DRILL-3588
                 URL: https://issues.apache.org/jira/browse/DRILL-3588
             Project: Apache Drill
          Issue Type: Improvement
            Reporter: Joseph Barefoot
            Priority: Critical


This feature is particularly important to us here at AtScale in order to leverage Drill as a query engine option for our BI on Hadoop solution. Currently you can connect to and query databases/tables from Hive Metastore fine. However if you create a table, it will be created in HDFS but no metadata is written to the Hive Metastore. That means those tables won't be easily visible to any other tool. 

When you read schemas from a Hive datasource via Drill, they are prefixed with "hive.". This namespacing makes sense to us considering how Drill works, and ideally it would work symmetrically when you create tables with the same prefix, i.e. Drill would map the prefix to the target data source, in this case Hive, and write the schema information back to the Hive MetaStore. Our specific use case is Create Table As Select, however ideally any DDL statements against a hive datasource schema/table would write back to the Hive Metastore. 

The reason it's important to have the metadata in Hive Metastore is we have found many of our customers use multiple SQL tools to access data tracked in the Metastore. For example, even if Impala is their primary SQL on Hadoop engine for clients/tools, they may run Spark jobs to manipulate data via RDDs that pull data by referencing the Metastore. Organizations using a lot of SQL on Hadoop have come to expect this sort of interoperability between Hive, Spark, and Impala, and supporting it within Drill will help drive adoption within the Hadoop community (besides making it a lot easier for us to use Drill effectively from within our BI engine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)