You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Thomas Bünger <th...@googlemail.com> on 2017/01/27 22:43:03 UTC

Writing your own Storage Plugin

Hello,

I just started using Drill and would love to use it as a query engine for a custom data format.
The data format is actually a set of SQLite files that additionally contain hierarchical data in some fields (similar to json, but stored in a binary format)
Now, I want to learn how to write a custom storage plugin for this.

Therefore I’d like to know:

1) Is there any documentation / tutorial out there covering this topic?
2) Which existing storage plugin might be a good candidate to look at and learn from?
3) As SQLite is the underlying storage engine and thus is able to help in pushed-down selections and joins, can I implement partial pushdown, meaning just for some fields?
    Because SQLite can help with operations on regular SQLite columns, but SQLite does not understand the binary blobs that represent structural nested data. So, can I tell Drill, that
    only for some parts of the schema pushdown is possible?
4) If starting from an existing plugin, would you recommend starting from the JDBC plugin, a file-system/json plugin, or any other one?
5) Do I need to specify the schema fully (i.e. the full nested blob-column data types?) or is there something like Drill-Data-type: „json“ and Drill does the inspection?

I would appreciate your advice and best regards,
 Thomas

Re: Writing your own Storage Plugin

Posted by Tugdual Grall <tu...@gmail.com>.

I am not sure to understand the "format" and why you cannot use the jdbc
plugin with some UDF to parse the blob

Some comments below


> 1) Is there any documentation / tutorial out there covering this topic?


No


> 2) Which existing storage plugin might be a good candidate to look at and
> learn from?


The MongoDB plugin is good for that,
But in your case you can probably directly use the JDBC plugin to call
SQLite focus on binary format


> 3) As SQLite is the underlying storage engine and thus is able to help in
> pushed-down selections and joins, can I implement partial pushdown, meaning
> just for some fields?
>     Because SQLite can help with operations on regular SQLite columns, but
> SQLite does not understand the binary blobs that represent structural
> nested data. So, can I tell Drill, that
>     only for some parts of the schema pushdown is possible?


Can you give an example ?

>
> 4) If starting from an existing plugin, would you recommend starting from
> the JDBC plugin, a file-system/json plugin, or any other one?


Since it looks you have to focus on some fields only you can look at
https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store


> 5) Do I need to specify the schema fully (i.e. the full nested blob-column
> data types?) or is there something like Drill-Data-type: „json“ and Drill
> does the inspection?


Drill can do the inspection , look at Json/parquet


> I would appreciate your advice and best regards,
>  Thomas