You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Charles Givre <cg...@gmail.com> on 2019/11/05 03:23:32 UTC

[DISCUSS] Drill Storage Plugins

Hello all, 
I've written some UDFs and Format plugins for Drill and I'm interested in tackling a storage plugin.  One of my regrets from the Drill book was that we didn't get into this topic.  For those of you who have written one, my hat's off to you. I wanted to ask if there are any resources or tutorials available that you found particularly helpful?  I'm having a little trouble figuring out what all the pieces do and how they fit together.

Does anyone have any ideas about storage plugins should be implemented?  Personally I'd really like to see one for ElasticSearch,
Best,
-- C

Re: [DISCUSS] Drill Storage Plugins

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Charles,

Storage plugins are a bit complex because they integrate not just with the runtime engine, but also with the Calcite planning engine. Format plugins are simpler because they are mostly runtime-only. The "Easy" framework hides much of the planner integration, and the EVF "Easier" revisions hide the details even more.

Yes, we did choose to omit the storage plugins from the Drill book because of the large amount of complexity involved. I like the suggestion that we gather information about how to create a storage plugin and post it somewhere. If we do an "Expanded and Revised" edition of the book, we can incorporate the material at that time.

We may also want to create something like the "Easy" framework that hides (or at least simplifies) all the Calcite knick-knacks that we must currently fiddle with.

Finally, the simplest possible storage plugin is the "MockStoragePlugin" in Drill itself. This plugin uses some crazy tricks to generate random data which we use when testing operators, such as when we need a large number of rows to test sort spilling. The Mock plugin is a bit of a mess, but it at least mostly "factors out" any additional complexity that comes from interfacing with an external system.

Thanks,
- Paul

 

    On Tuesday, November 5, 2019, 6:14:06 AM PST, Charles Givre <cg...@gmail.com> wrote:  
 
 One more thing:  I've found code for storage plugins (in various states of completion) for the folllowing systems:
DynamoDB (https://github.com/fineoio/drill-dynamo-adapter <https://github.com/fineoio/drill-dynamo-adapter>) 
Apache Druid:  (Current Draft PR https://github.com/apache/drill/pull/1888 <https://github.com/apache/drill/pull/1888>)
Couchbase: (https://github.com/LyleLeo/Apache-Drill-CouchDB-Storage-Plugin <https://github.com/LyleLeo/Apache-Drill-CouchDB-Storage-Plugin>) (Author said he would consider submitting as PR)
ElasticSearch: https://github.com/javiercanillas/drill-storage-elastic <https://github.com/javiercanillas/drill-storage-elastic>, https://github.com/gaoshui87/drill-storage-elastic <https://github.com/gaoshui87/drill-storage-elastic>
Apache Solr

Are there others that anyone knows of?


> On Nov 4, 2019, at 10:23 PM, Charles Givre <cg...@gmail.com> wrote:
> 
> Hello all, 
> I've written some UDFs and Format plugins for Drill and I'm interested in tackling a storage plugin.  One of my regrets from the Drill book was that we didn't get into this topic.  For those of you who have written one, my hat's off to you. I wanted to ask if there are any resources or tutorials available that you found particularly helpful?  I'm having a little trouble figuring out what all the pieces do and how they fit together.
> 
> Does anyone have any ideas about storage plugins should be implemented?  Personally I'd really like to see one for ElasticSearch,
> Best,
> -- C
  

Re: [DISCUSS] Drill Storage Plugins

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Charles,

Storage plugins are a bit complex because they integrate not just with the runtime engine, but also with the Calcite planning engine. Format plugins are simpler because they are mostly runtime-only. The "Easy" framework hides much of the planner integration, and the EVF "Easier" revisions hide the details even more.

Yes, we did choose to omit the storage plugins from the Drill book because of the large amount of complexity involved. I like the suggestion that we gather information about how to create a storage plugin and post it somewhere. If we do an "Expanded and Revised" edition of the book, we can incorporate the material at that time.

We may also want to create something like the "Easy" framework that hides (or at least simplifies) all the Calcite knick-knacks that we must currently fiddle with.

Finally, the simplest possible storage plugin is the "MockStoragePlugin" in Drill itself. This plugin uses some crazy tricks to generate random data which we use when testing operators, such as when we need a large number of rows to test sort spilling. The Mock plugin is a bit of a mess, but it at least mostly "factors out" any additional complexity that comes from interfacing with an external system.

Thanks,
- Paul

 

    On Tuesday, November 5, 2019, 6:14:06 AM PST, Charles Givre <cg...@gmail.com> wrote:  
 
 One more thing:  I've found code for storage plugins (in various states of completion) for the folllowing systems:
DynamoDB (https://github.com/fineoio/drill-dynamo-adapter <https://github.com/fineoio/drill-dynamo-adapter>) 
Apache Druid:  (Current Draft PR https://github.com/apache/drill/pull/1888 <https://github.com/apache/drill/pull/1888>)
Couchbase: (https://github.com/LyleLeo/Apache-Drill-CouchDB-Storage-Plugin <https://github.com/LyleLeo/Apache-Drill-CouchDB-Storage-Plugin>) (Author said he would consider submitting as PR)
ElasticSearch: https://github.com/javiercanillas/drill-storage-elastic <https://github.com/javiercanillas/drill-storage-elastic>, https://github.com/gaoshui87/drill-storage-elastic <https://github.com/gaoshui87/drill-storage-elastic>
Apache Solr

Are there others that anyone knows of?


> On Nov 4, 2019, at 10:23 PM, Charles Givre <cg...@gmail.com> wrote:
> 
> Hello all, 
> I've written some UDFs and Format plugins for Drill and I'm interested in tackling a storage plugin.  One of my regrets from the Drill book was that we didn't get into this topic.  For those of you who have written one, my hat's off to you. I wanted to ask if there are any resources or tutorials available that you found particularly helpful?  I'm having a little trouble figuring out what all the pieces do and how they fit together.
> 
> Does anyone have any ideas about storage plugins should be implemented?  Personally I'd really like to see one for ElasticSearch,
> Best,
> -- C
  

Re: [DISCUSS] Drill Storage Plugins

Posted by Matt <bs...@gmail.com>.
Perhaps an "awesome-drill" repo on GitHub would be a place to back fill the
book, and serve as a central location for thins like the list you supplied:

https://github.com/topics/awesome

On Tue, Nov 5, 2019 at 9:14 AM Charles Givre <cg...@gmail.com> wrote:

> One more thing:  I've found code for storage plugins (in various states of
> completion) for the folllowing systems:
> DynamoDB (https://github.com/fineoio/drill-dynamo-adapter <
> https://github.com/fineoio/drill-dynamo-adapter>)
> Apache Druid:  (Current Draft PR https://github.com/apache/drill/pull/1888
> <https://github.com/apache/drill/pull/1888>)
> Couchbase: (https://github.com/LyleLeo/Apache-Drill-CouchDB-Storage-Plugin
> <https://github.com/LyleLeo/Apache-Drill-CouchDB-Storage-Plugin>) (Author
> said he would consider submitting as PR)
> ElasticSearch: https://github.com/javiercanillas/drill-storage-elastic <
> https://github.com/javiercanillas/drill-storage-elastic>,
> https://github.com/gaoshui87/drill-storage-elastic <
> https://github.com/gaoshui87/drill-storage-elastic>
> Apache Solr
>
> Are there others that anyone knows of?
>
>
> > On Nov 4, 2019, at 10:23 PM, Charles Givre <cg...@gmail.com> wrote:
> >
> > Hello all,
> > I've written some UDFs and Format plugins for Drill and I'm interested
> in tackling a storage plugin.  One of my regrets from the Drill book was
> that we didn't get into this topic.  For those of you who have written one,
> my hat's off to you. I wanted to ask if there are any resources or
> tutorials available that you found particularly helpful?  I'm having a
> little trouble figuring out what all the pieces do and how they fit
> together.
> >
> > Does anyone have any ideas about storage plugins should be implemented?
> Personally I'd really like to see one for ElasticSearch,
> > Best,
> > -- C
>
>

Re: [DISCUSS] Drill Storage Plugins

Posted by Charles Givre <cg...@gmail.com>.
One more thing:  I've found code for storage plugins (in various states of completion) for the folllowing systems:
DynamoDB (https://github.com/fineoio/drill-dynamo-adapter <https://github.com/fineoio/drill-dynamo-adapter>) 
Apache Druid:  (Current Draft PR https://github.com/apache/drill/pull/1888 <https://github.com/apache/drill/pull/1888>)
Couchbase: (https://github.com/LyleLeo/Apache-Drill-CouchDB-Storage-Plugin <https://github.com/LyleLeo/Apache-Drill-CouchDB-Storage-Plugin>) (Author said he would consider submitting as PR)
ElasticSearch: https://github.com/javiercanillas/drill-storage-elastic <https://github.com/javiercanillas/drill-storage-elastic>, https://github.com/gaoshui87/drill-storage-elastic <https://github.com/gaoshui87/drill-storage-elastic>
Apache Solr

Are there others that anyone knows of?


> On Nov 4, 2019, at 10:23 PM, Charles Givre <cg...@gmail.com> wrote:
> 
> Hello all, 
> I've written some UDFs and Format plugins for Drill and I'm interested in tackling a storage plugin.  One of my regrets from the Drill book was that we didn't get into this topic.  For those of you who have written one, my hat's off to you. I wanted to ask if there are any resources or tutorials available that you found particularly helpful?  I'm having a little trouble figuring out what all the pieces do and how they fit together.
> 
> Does anyone have any ideas about storage plugins should be implemented?  Personally I'd really like to see one for ElasticSearch,
> Best,
> -- C


Re: [DISCUSS] Drill Storage Plugins

Posted by Charles Givre <cg...@gmail.com>.
One more thing:  I've found code for storage plugins (in various states of completion) for the folllowing systems:
DynamoDB (https://github.com/fineoio/drill-dynamo-adapter <https://github.com/fineoio/drill-dynamo-adapter>) 
Apache Druid:  (Current Draft PR https://github.com/apache/drill/pull/1888 <https://github.com/apache/drill/pull/1888>)
Couchbase: (https://github.com/LyleLeo/Apache-Drill-CouchDB-Storage-Plugin <https://github.com/LyleLeo/Apache-Drill-CouchDB-Storage-Plugin>) (Author said he would consider submitting as PR)
ElasticSearch: https://github.com/javiercanillas/drill-storage-elastic <https://github.com/javiercanillas/drill-storage-elastic>, https://github.com/gaoshui87/drill-storage-elastic <https://github.com/gaoshui87/drill-storage-elastic>
Apache Solr

Are there others that anyone knows of?


> On Nov 4, 2019, at 10:23 PM, Charles Givre <cg...@gmail.com> wrote:
> 
> Hello all, 
> I've written some UDFs and Format plugins for Drill and I'm interested in tackling a storage plugin.  One of my regrets from the Drill book was that we didn't get into this topic.  For those of you who have written one, my hat's off to you. I wanted to ask if there are any resources or tutorials available that you found particularly helpful?  I'm having a little trouble figuring out what all the pieces do and how they fit together.
> 
> Does anyone have any ideas about storage plugins should be implemented?  Personally I'd really like to see one for ElasticSearch,
> Best,
> -- C