You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Ananth Gundabattula <ag...@threatmetrix.com> on 2014/11/23 22:47:20 UTC

Help regarding new storage plugin and Query parser documentation

Hello All,

I am trying to experiment with Drill a bit for a project and have the following two questions :

1. Is there any documentation how to generate a storage plugin for Drill ?

2. Is there any documentation as to how plugin a custom query parser ? - Basically the Data warehouse we are building is a mix of many systems and a given query can be answered by more than one subsystem. However we would like the query to be answered by analyzing the query on the fly and route it to the best sub-system that can answer the query. Drill seems to be an excellent option as it also provides a jDBC driver option. However on looking at the Drill wiki, I could not find any document which gives an idea as to how to plugin both a new Storage engine and a new Query parser that meets criteria for “rewriting an incoming query”

Thanks a lot for your time.

Regards,
Ananth

Re: Help regarding new storage plugin and Query parser documentation

Posted by Ananth Gundabattula <ag...@threatmetrix.com>.

Thanks a lot for the pointers Tim. This is a good starting point for me.

Regards,
Ananth 

On 24/11/2014 8:56 am, "Timothy Chen" <tn...@gmail.com> wrote:

>Hi Ananth,
>
>Unforunately there isn't any documentation listing all the specific
>steps to add a new storage plugin, although that will be super helpful
>for anyone interested.
>
>I think the easiest way is to look at the last two storage plugins
>that were added, as they're independent modules which are in
>contrib/storage-mongo and contrib/storage-hbase.
>
>The MongoDB plugin basically was added with one commit:
>2ca9c907bff639e08a561eac32e0acab3a0b3304
>
>About adding a custom query parser, it's definitely doable and there
>isn't much documentation to refer to,
>
>I think the best example I can think of as a reference is probably
>what Yash did for adding Pig Latin support:
>
>https://github.com/yssharma/pig-on-drill/commit/b2d8a23c11d03974e16eb2ff44
>e021b1e957f03f
>
>Let us know if you need more help,
>
>Tim
>
>On Sun, Nov 23, 2014 at 1:47 PM, Ananth Gundabattula
><ag...@threatmetrix.com> wrote:
>> Hello All,
>>
>> I am trying to experiment with Drill a bit for a project and have the
>>following two questions :
>>
>> 1. Is there any documentation how to generate a storage plugin for
>>Drill ?
>>
>> 2. Is there any documentation as to how plugin a custom query parser ?
>>- Basically the Data warehouse we are building is a mix of many systems
>>and a given query can be answered by more than one subsystem. However we
>>would like the query to be answered by analyzing the query on the fly
>>and route it to the best sub-system that can answer the query. Drill
>>seems to be an excellent option as it also provides a jDBC driver
>>option. However on looking at the Drill wiki, I could not find any
>>document which gives an idea as to how to plugin both a new Storage
>>engine and a new Query parser that meets criteria for ³rewriting an
>>incoming query²
>>
>> Thanks a lot for your time.
>>
>> Regards,
>> Ananth

Re: Help regarding new storage plugin and Query parser documentation

Posted by Timothy Chen <tn...@gmail.com>.

Hi Ananth,

Unforunately there isn't any documentation listing all the specific
steps to add a new storage plugin, although that will be super helpful
for anyone interested.

I think the easiest way is to look at the last two storage plugins
that were added, as they're independent modules which are in
contrib/storage-mongo and contrib/storage-hbase.

The MongoDB plugin basically was added with one commit:
2ca9c907bff639e08a561eac32e0acab3a0b3304

About adding a custom query parser, it's definitely doable and there
isn't much documentation to refer to,

I think the best example I can think of as a reference is probably
what Yash did for adding Pig Latin support:

https://github.com/yssharma/pig-on-drill/commit/b2d8a23c11d03974e16eb2ff44e021b1e957f03f

Let us know if you need more help,

Tim

On Sun, Nov 23, 2014 at 1:47 PM, Ananth Gundabattula
<ag...@threatmetrix.com> wrote:
> Hello All,
>
> I am trying to experiment with Drill a bit for a project and have the following two questions :
>
> 1. Is there any documentation how to generate a storage plugin for Drill ?
>
> 2. Is there any documentation as to how plugin a custom query parser ? - Basically the Data warehouse we are building is a mix of many systems and a given query can be answered by more than one subsystem. However we would like the query to be answered by analyzing the query on the fly and route it to the best sub-system that can answer the query. Drill seems to be an excellent option as it also provides a jDBC driver option. However on looking at the Drill wiki, I could not find any document which gives an idea as to how to plugin both a new Storage engine and a new Query parser that meets criteria for “rewriting an incoming query”
>
> Thanks a lot for your time.
>
> Regards,
> Ananth

Re: Help regarding new storage plugin and Query parser documentation

Posted by Ananth Gundabattula <ag...@threatmetrix.com>.

Hello Carol,

 > Which Systems 
We are having a graph storage format on top of cassandra and an indexing
system running as a set of impala tables. We would want to direct the
query that can best a fit a given sub-system and then expose the result to
the external client as a simple jdbc result set . In the above example, we
fire the query on Impala, get back row keys/Vertex Ids and then serve full
rows if requested as the result from the graph/cassandra layer.

>drill already does this for dfs, hive, hbase, mongo
By analyzing the query I mean, the indexing system might have a ³reduced²
format of the data wherein the actual data is replaced by entries in the
lookup table. This is saving us a lot of space and better caching
strategies in the first place. Hence we would like to ³rewrite² the
incoming query before firing it off into the Impala sub-system and extract
the result and use it further to process the query. Hence the request for
plug-ability of a custom query parser.

I understand the above use cases are entirely specific for our project
needs and hence was wondering if we can alter Drill to the needs we have.

Regards,
Ananth 

On 25/11/2014 2:14 am, "Carol McDonald" <cm...@maprtech.com> wrote:

>> mix of many systems
>which systems?
>>analyzing the query on the fly and route it to the best sub-system
>drill already does this for dfs, hive, hbase, mongo
>
>On Sun, Nov 23, 2014 at 4:47 PM, Ananth Gundabattula <
>agundabattula@threatmetrix.com> wrote:
>
>> Hello All,
>>
>> I am trying to experiment with Drill a bit for a project and have the
>> following two questions :
>>
>> 1. Is there any documentation how to generate a storage plugin for
>>Drill ?
>>
>> 2. Is there any documentation as to how plugin a custom query parser ? -
>> Basically the Data warehouse we are building is a mix of many systems
>>and a
>> given query can be answered by more than one subsystem. However we would
>> like the query to be answered by analyzing the query on the fly and
>>route
>> it to the best sub-system that can answer the query. Drill seems to be
>>an
>> excellent option as it also provides a jDBC driver option. However on
>> looking at the Drill wiki, I could not find any document which gives an
>> idea as to how to plugin both a new Storage engine and a new Query
>>parser
>> that meets criteria for ³rewriting an incoming query²
>>
>> Thanks a lot for your time.
>>
>> Regards,
>> Ananth
>>

Re: Help regarding new storage plugin and Query parser documentation

Posted by Carol McDonald <cm...@maprtech.com>.

> mix of many systems
which systems?
>analyzing the query on the fly and route it to the best sub-system
drill already does this for dfs, hive, hbase, mongo

On Sun, Nov 23, 2014 at 4:47 PM, Ananth Gundabattula <
agundabattula@threatmetrix.com> wrote:

> Hello All,
>
> I am trying to experiment with Drill a bit for a project and have the
> following two questions :
>
> 1. Is there any documentation how to generate a storage plugin for Drill ?
>
> 2. Is there any documentation as to how plugin a custom query parser ? -
> Basically the Data warehouse we are building is a mix of many systems and a
> given query can be answered by more than one subsystem. However we would
> like the query to be answered by analyzing the query on the fly and route
> it to the best sub-system that can answer the query. Drill seems to be an
> excellent option as it also provides a jDBC driver option. However on
> looking at the Drill wiki, I could not find any document which gives an
> idea as to how to plugin both a new Storage engine and a new Query parser
> that meets criteria for “rewriting an incoming query”
>
> Thanks a lot for your time.
>
> Regards,
> Ananth
>