You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Paul Rogers (Jira)" <ji...@apache.org> on 2020/01/31 02:18:00 UTC

[jira] [Created] (DRILL-7559) Generalize provided schema handling for non-DFS plugins

Paul Rogers created DRILL-7559:
----------------------------------

Summary: Generalize provided schema handling for non-DFS plugins
Key: DRILL-7559
URL: https://issues.apache.org/jira/browse/DRILL-7559
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.18.0
Reporter: Paul Rogers

Drill offers a "provided schema" mechanism which is currently a work in progress.
DRILL-7458: Base framework for storage plugins, shows how a custom scan can support a provided schema via a single line of code:

{code:java}
builder.typeConverterBuilder().providedSchema(subScan.getSchema());
{code}

The challenge, however, is how the plugin obtains the schema. At present, it is quite complex and ad-hoc:

* The plugin's schema factory would look up the schema in some plugin-specific way.
* The schema would then be passed as part of the scan spec to the group scan.
* The group scan would pass the provided schema to the sub scan.
* The sub-scan carries the schema into the execution step so that, finally, the plugin can use the above line of code.

Needless to say, the developer experience is not quite a simple as it might be. In particular, the developer has to solve the complex problem of where to store the schema. DFS-based format plugins can use the existing file-based mechanism. Non-DFS plugins have no such choice.

So, the improvements we need are:

* Provide a reusable, shared schema registry that works even if Drill is not used with a DFS.
* Augment the SQL commands for creating a schema to use this new registry.
* Add the schema to the Base framework classes so it is automatically looked up from the registry, passed along the scan chain, and set on the reader builder at run time.

Note that we can probably leverage the work done for the metastore API. A metastore generally stores two kinds of data: 1) schema and 2) stats. Perhaps we can implement a DB-based version for non-DFS configurations.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)